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(54) ESTs and encoded human proteins 

(57) The sequences of 5' ESTs and consensus con- 
tigated 5'ESTs derived from mRNAs encoding secreted 
proteins are disclosed. The 5* ESTs and consensus con- 
tigated 5'ESTs may be to obtain cDNAs and genomic 
DNAs corresponding to the 5* ESTs and consensus con- 
tigated 5'ESTs The 5' ESTs and consensus contigated 
5'ESTs may also be used in diagnostic, forensic, gene 
therapy, and chromosome mapping procedures. Up- 
stream regulatory sequences may also be obtained us- 
ing the 5' ESTs and consensus contigated 5'ESTs. The 
5' ESTs and consensus contigated 5'ESTs may also be 
used to design expression vectors and secretion vec- 
tors. 
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Description 

Background of the Invention 

[0001] The estimated 50,000-100,000 genes scat- 
tered along the human chromosomes offer tremendous 
promise forthe understanding, diagnosis, and treatment 
of human diseases. In addition, probes capable of spe- 
cifically hybridizing to loci distributed throughout the hu- 
man genome find applications in the construction of high 
resolution chromosome maps and in the identification 
of individuals. 

[0002] In the past, the characterization of even a sin- 
gle human gene was a painstaking process, requiring 
years of effort. Recent developments in the areas of 
cloning vectors, DNA sequencing, and computer tech- 
nology have merged to greatly accelerate the rate at 
which human genes can be isolated, sequenced, 
mapped, and characterized. 

[0003] Currently, two different approaches are being 
pursued for identifying and characterizing the genes dis- 
tributed along the human genome. In one approach, 
large fragments of genomic DNA are isolated, cloned, 
and sequenced. Potential open reading frames in these 
genomic sequences are identified using bioinformatics 
software. However, this approach entails sequencing 
large stretches of human DNA which do not encode pro- 
teins in order to find the protein encoding sequences 
scattered throughout the genome. In addition to requir- 
ing extensive sequencing, the bioinformatics software 
may mischaracterize the genomic sequences obtained, 
i.e., labeling non-coding DNA as coding DNA and vice 
versa. 

[0004] An alternative approach takes a more direct 
route to identifying and characterizing human genes. In 
this approach, complementary DNAs (cDNAs) are syn- 
thesized from isolated messenger RNAs (mRNAs) 
which encode human proteins. Using this approach, se- 
quencing is only performed on DNA which is derived 
from protein coding portions of the genome. Often, only 
short stretches of the cDNAs are sequenced to obtain 
sequences called expressed sequence tags (ESTs). 
The ESTs may then be used to isolate or purify extended 
cDNAs which include sequences adjacent to the EST 
sequences The extended cDNAs may contain ail of the 
sequence of the EST which was used to obtain them or 
only a portion of the sequence of the EST which was 
used to obtain them. In addition, the extended cDNAs 
may contain the full coding sequence of the gene from 
which the EST was derived or, alternatively, the extend- 
ed cDN As may include portions of the coding sequence 
of the gene from which the EST was derived. It will be 
appreciated that there may be several extended cDNAs 
which include the EST sequence as a result of alternate 
splicing or the activity of alternative promoters. Alterna- 
tively, ESTs having partially overlapping sequences may 
be identified and contigs comprising the consensus se- 
quences of the overlapping ESTs may be identified. 



[0005] In the past, these short EST sequences were 
often obtained from oligo-dT primed cDNA libraries. Ac- 
cordingly, they mainly corresponded to the 3' untrans- 
lated region of the mRNA. In part, the prevalence of EST 
5 sequences derived from the 3' end of the mRNA is a 
result of the fact that typical techniques for obtaining cD- 
NAs, are not well suited for isolating cDNA sequences 
derived from the 5' ends of mRNAs (Adams era/.. Nature 
377:3-1 74, 1 996, Hillier etal., Genome Res. 6:807-828, 
10 1996). 

[0006] In addition, in those reported instances where 
longer cDNA sequences have been obtained, the re- 
ported sequences typically correspond to coding se- 
quences and do not include the full 5* untranslated re- 
's gion (5'UTR) of the mRNA from which the cDNA is de- 
rived. Indeed, 5'UTRs have been shown to affect either 
the stability or translation of mRNAs. Thus, regulation 
of gene expression may be achieved through the use of 
alternative 5'UTRs as shown, for instance, for the trans- 
20 lation of the tissue inhibitor of metalloprotease mRNA in 
mitogenicaily activated cells (Waterhouse et al, J Biol 
Chem. 265:5585-9. 1990). Furthermore, modification of 
SUTH through mutation, insertion or translocation 
events may even be implied in pathogenesis. For in- 
& stance, the fragile X syndrome, the most common cause 
of inherited mental retardation, is party due to an inser- 
tion of multiple CGG trinucleotides in the 5'UTR of the 
fragile X mRNA resulting in the inhibition of protein syn- 
thesis via ribosome stalling (Feng etal, Science 268 : 
«> 731-4, 1995). An aberrant mutation in regions of the 
5'UTR known to inhibit translation of the proto-onco- 
gene c-myc was shown to result in upregulation of C- 
myc protein levels in cells derived from patients with 
multiple myelomas (Willis era/., Curr Top Microbiol Im- 
35 munoi 224:269-76, 1 997). In addition, the use of oligo- 
cfT primed cDNA libraries does not allow the isolation of 
complete 5'UTRs since such incomplete sequences ob- 
tained by this process may not include the first exon of 
the mRNA, particularly in situations where the first exon 
40 is short. Furthermore, they may not include some exons, 
often short ones, which are located upstream of splicing 
sites. Thus, there is a need to obtain sequences derived 
from the 5' ends of mRNAs. 

[0007] While many sequences derived from human 
45 chromosomes have practical applications, approaches 
based on the identification and characterization of those 
chromosomal sequences which encode a protein prod- 
uct are particularly relevant to diagnostic and therapeu- 
tic uses. In some instances, the sequences used in such 
so therapeutic or diagnostic techniques may be sequences 
which encode proteins which are secreted from the cell 
in which they are synthesized. Those sequences encod- 
ing secreted proteins as well as the secreted proteins 
themselves, are particularly valuable as potential ther- 
ms apeutic agents. Such proteins are often involved in cell 
to cell communication and may be responsible for pro- 
ducing a clinically relevant response in their target cells. 
In fact, several secretory proteins, including tissue plas- 
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minogen activator, G-CSF, GM-CSF, erythropoietin, hu- 
man growth hormone, insulin, interferon- , interferon- , 
interferon-, and interleukin-2, are currently in clinical 
use. These proteins are used to treat a wide range of 
conditions, including acute myocardial infarction, acute 5 
ischemic stroke, anemia, diabetes, growth hormone de- 
ficiency, hepatitis, kidney carcinoma, chemotherapy-in- 
duced neutropenia and multiple sclerosis. For these 
reasons, extended cDNAs encoding secreted proteins 
or portions thereof represent a valuable source of ther- 
apeutic agents. Thus, there is a need for the identifica- 
tion and characterization of secreted proteins and the 
nucleic acids encoding them. 

[0008] In addition to being therapeutically useful 
themselves, secretory proteins include short peptides, 
called signal peptides, at their amino termini which direct 
their secretion. These signal peptides are encoded by 
the signal sequences located at the 5' ends of the coding 
sequences of genes encoding secreted proteins. These 
signal peptides can be used to direct the extracellular 
secretion of any protein to which they are operably 
linked. In addition, portions of the signal peptides called 
membrane-translocating sequences, may also be used 
to direct the intracellular import of a peptide or protein 
of interest. This may prove beneficial in gene therapy 
strategies in which it is desired to deliver a particular 
gene product to cells other than the cells in which it is 
produced. Signal sequences encoding signal peptides 
also find application in simplifying protein purification 
techniques. In such applications, the extracellular se- 
cretion ofthe desired protein greatly facilitates purifica- 
tion by reducing the number of undesired proteins from 
which the desired protein must be selected. Thus, there 
exists a need to identify and characterize the 5' portions 
of the genes for secretory proteins which encode signal 
peptides. 

[0009] Sequences coding for non-secreted proteins 
may also find application as therapeutics or diagnostics. 
In particular, such sequences may be used to determine 
whether an individual is likely to express a detectable 
phenotype, such as a disease, as a consequence of a 
mutation in the coding sequence of a protein. In instanc- 
es where the individual is at risk of suffering from a dis- 
ease or other undesirable phenotype as a result of a mu- 
tation in such a coding sequence, the undesirable phe- 
notype may be corrected by introducing a normal coding 
sequence using gene therapy. Alternatively, if the unde- 
sirable phenotype results from overexpression of the 
protein encoded by the coding sequence, expression of 
the protein may be reduced using antisense or triple he- 
lix based strategies. 

[001 0] The secreted or non-secreted human polypep- 
tides encoded by the coding sequences may also be 
used as therapeutics by administering them directly to 
an individual having a condition, such as a disease, re- 
sulting from a mutation in the sequence encoding the 
polypeptide. In such an instance, the condition can be 
cured or ameliorated by administering the polypeptide 
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to the individual. 

[0011] In addition, the secreted or non-secreted hu- 
man polypeptides or portions thereof may be used to 
generate antibodies useful in determining the tissue 
type or species of origin of a biological sample. For ex- 
ample, to distinguish between human and non-human 
cells and tissues or to distinguish between human tis- 
sues that do and do not express the polypeptides .The 
antibodies may also be used to determine the cellular 
localization of the secreted or non-secreted human 
polypeptides or the cellular localization of polypeptides 
which have been fused to the human polypeptides. In 
addition, the antibodies may also be used in immunoaf- 
finity chromatography techniques to isolate, purify, or 
enrich the human polypeptide or a target polypeptide 
which has been fused to the human polypeptide. Public 
information on the number of human genes for which 
the promoters and upstream regulatory regions have 
been identified and characterized is quite limited. In part, 
this may be due to the difficulty of isolating such regu- 
latory sequences. Upstream regulatory sequences such 
as transcription factor binding sites are typically too 
short to be utilized as probes for isolating promoters 
from human genomic libraries. Recently, some ap- 
proaches have been developed to isolate human pro- 
moters. One of them consists of making a CpG island 
library (Cross era/., Nature GeneticsB: 236-244, 1994). 
The second consists of isolating human genomic DNA 
sequences containing Spel binding sites by the use of 
Spel binding protein (Mortlock et al., Genome Res. 6: 
327-335, 1996). Both of these approaches have their 
limits due to a lack of specificity and of comprehensive- 
ness. Thus, there exists a need to identify and system- 
atically characterize the 5* portions of the genes. 
[0012] The present 5' ESTs may be used to efficiently 
identify and isolate 5'UTRs and upstream regulatory re- 
gions which control the location, developmental stage, 
rate, and quantity of protein synthesis, as well as the 
stability of the mRNA. Once identified and character- 
ized, these regulatory regions may be utilized in gene 
therapy or protein purification schemes to obtain the de- 
sired amount and locations of protein synthesis or to in- 
hibit, reduce, or prevent the synthesis of undesirable 
gene products. The regulatory regions may also be used 
for expressing polypeptides in cell types from which the 
5' EST of the present invention were isolated. 
[0013] In addition, ESTs containing the 5' ends of pro- 
tein genes may include sequences useful as probes for 
chromosome mapping and the identification of individ- 
uals. Thus, there is a need to identify and characterize 
the sequences upstream of the 5 1 coding sequences of 
genes. 

Summary of the Invention 

[0014] The present invention relates to purified, iso- 
lated, or enriched 5' ESTs which include sequences de- 
rived from the authentic 5' ends of their corresponding 
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mRNAs. The term "corresponding mRNA" refers to the 
mRNA which was the template for the cDNA synthesis 
which produced the 5' EST. These sequences will be 
referred to hereinafter as "5 1 ESTs". The present Inven- 
tion also includes purified, isolated or enriched nucleic 
acids comprising contigs assembled by determining a 
consensus sequences from a plurality of ESTs contain- 
ing overlapping sequences. These contigs will be re- 
ferred to herein as "consensus contigated ESTs." 
[0015] As used herein, the term "purified" does not re- 
quire absolute purity; rather, it is intended as a relative 
definition. Individual 5' EST clones isolated from a cDNA 
library have been conventionally purified to electro- 
phoretic homogeneity. The sequences obtained from 
these clones could not be obtained directly either From 
the library or from total human DNA. The cDNA clones 
are not naturally, occurring as such, but rather are ob- 
tained via manipulation of a partially purified naturally 
occurring substance (messenger RNA). The conversion 
of mRNA into a cDNA library involves the creation of a 
synthetic substance (cDNA) and pure Individual cDNA 
clones can be isolated from the synthetic library by clon- 
al selection. Thus, creating a cDNA library from mes- 
senger RNA and subsequently isolating individual 
clones from that library results in an approximately 10 4 - 
10 6 fold purification of the native message. Purification 
of starting material or natural material to at least one 
order of magnitude, preferably two or three orders, and 
more preferably four or five orders of magnitude is ex- 
pressly contemplated. Alternatively, purification maybe 
expressed as "at least" a percent purity relative to het- 
erologous polynucleotides (DNA, RNA or both). As a 
preferred embodiment, the polynucleotides ofthe 
present invention are at least; 10%, 20%, 30%, 40%, 
50%, 60%, 70%, 80%, 90%, 95%, 96%, 96%, 98%, 
99%, or 100% pure relative to heterologous polynucle- 
otides. As a further preferred embodiment the polynu- 
cleotides have an "at least" purity ranging from any 
number, to the thousandth position, between 90% and 
100% (e.g., 5' EST at least 99.995% pure) relative to 
heterologous polynucleotides. Additionally, purity ofthe 
polynucleotides may be expressed as a percentage (as 
described above) relative to all materials and com- 
pounds other than the carrier solution. Each number, to 
thethousandth position, may be claimed as individual 
species of purity. 

[001 6] As used herein, the term "isolated" requires 
that the material be removed from its original environ- 
ment (e.g., the natural environment if it is naturally oc- 
curring). For example, a naturally-occurring polynucle- 
otide present in a living animal is not isolated, but the 
same polynucleotide, separated from some or all of the 
coexisting materials in the natural system, is isolated. 
Specifically excluded from the definition of "isolated" 
are: naturally occurring chromosomes (e.g., chromo- 
some spreads) artificial chromosome libraries, genomic 
libraries, and cDNA libraries that exist either as an in 
vitro nucleic acid preparation or as a transfected/trans- 



formed host cell preparation, wherein the host cells are 
either an in vitro heterogeneous preparation or plated 
as a heterogeneous population of single colonies. Also 
specifically excluded are the above libraries wherein the 
5 5' EST makes up less than 5% of the number of nucleic 
acid inserts in the vector molecules. Further specifically 
excluded are whole cell genomic DNA or whole cell RNA 
preparations (including said whole cell preparations 
which arc mechanically sheared or enzymaticly digest- 
10 ed). Further specifically excluded are the above whole 
cell preparations as either an in vitro preparation or as 
a heterogeneous mixture separated by electrophoresis 
(including blot transfers of the same) wherein the poly- 
nucleotide of the invention have not been further sepa- 
15 rated from the heterologous polynucleotides In the elec- 
trophoresis medium (e.g., further separating by excising 
a single band from a heterogeneous band population in 
an agarose gel or nylon blot). 
[Q01 7] As used herein, the term "recombinant" means 
20 that the 5 1 EST is adjacent to "backbone" nucleic acid 
to which it is not adjacent in its natural environment Ad- 
ditionally, to be "enriched" the 5* ESTs will represent 5% 
or more of the number of nucleic acid inserts in a pop- 
ulation of nucleic acid backbone molecules. Backbone 
25 molecules according to the present invention include 
nucleic acids such as expression vectors, self-replicat- 
ing nucleic acids, viruses, integrating nucleic acids, and 
other vectors or nucleic acids used to maintain or ma- 
nipulate a nucleic acid insert of interest. Preferably, the 
30 enriched 5* ESTs represent 1 5% or more of the number 
of nucleic acid inserts in the population of recombinant 
backbone molecules. More preferably, the enriched 5' 
ESTs represent 50% or more of the number of nucleic 
acid inserts in the population of recombinant backbone 
35 molecules. In a highly preferred embodiment, the en- 
riched 5' ESTs represent 90% or more (including any 
integer between 90 and 100%, to the thousandth posi- 
tion, e.g., 99.5%) of the number of nucleic acid inserts 
in the population of recombinant backbone molecules. 
40 [0018] "Stringent", "moderate," and "low" hybridiza- 
tion conditions are as defined below. 
[0019] The term "polypeptide" refers to a polymer of 
amino acids without regard to the length of the polymer; 
thus, peptides, oligopeptides, and proteins are included 
45 within the definition of polypeptide. This term also does 
not specify or exclude chemical or post-expression 
modifications of the polypeptides of the invention, al- 
though chemical or post-expression modifications of 
these polypeptides may be included excluded as spe- 
50 cific embodiments. Therefore, for example, modifica- 
tions to polypeptides which include the covalent attach- 
ment of glycosyl groups, acetyl groups, phosphate 
groups, lipid groups and the like are expressly encom- 
passed by the term polypeptide. Further, polypeptides 
55 with these modifications may be specified as individual 
species to be included or excluded from the present in- 
vention. The natural or other chemical modifications, 
such as those listed in example above, can occur any- 
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where in a polypeptide, including the peptide backbone, 
the amino acid side-chains and the amino or carboxyl 
termini. It will be appreciated that the same type of mod- 
ification may be present in the same or varying degrees 
at several sites in a given polypeptide. Also, a given 
polypeptide may contain many types of modifications. 
Polypeptides may be branched, for example, as a result 
of ubiquiti nation, and they may be cyclic, with or without 
branching. Modifications include acetylation, acylation, 
ADP-ribosylation t amidation, covalent attachment of fla- 
vin, covalent attachment of a heme moiety, covalent at- 
tachment of a nucleotide or nucleotide derivative, cov- 
alent attachment of a lipid or lipid derivative, covalent 
attachment of phosphotidyl inositol, cross-linking, cycli- 
zation, disulfide bond formation, demethytation, forma- 
tion of covalent cross-links, formation of cysteine, for- 
mation of pyroglutamate, formylation gamma-carboxy- 
lation, glycosylation, GPI anchor formation, hydroxyla- 
tion, iodination, methytation, myristoylation, oxidation, 
pegylatton, proteolytic processing, phosphorylation, 
prenylation, racemization, selenoylation, sulfation, 
transfer-RNA mediated addition of amino acids to pro- 
teins such as arginytation, and ubiquitination. (See, for 
instance, PROTEINS - STRUCTURE AND MOLECU- 
LAR PROPERTIES, 2nd Ed., T. E. Creighton, W. H. 
Freeman and Company, New York (1993); POST- 
TRANSLATIONAL COVALENT MODIFICATION OF 
PROTEINS, B. C. Johnson, Ed., Academic Press, New 
York, pgs. 1 -1 2 (1 983) ; Seifter et al. , Meth Enzymol 1 82: 
626-646 (1990); Rattan et al., Ann NY Acad Sci 663: 
48-62 (1992).). Also included within the definition are 
polypeptides which contain one or more analogs of an 
amino acid (including, for example, non-naturaily occur- 
ring amino acids, amino acids which only occur naturally 
in an unrelated biological system, modified amino acids 
from mammalian systems etc.), polypeptides with sub- 
stituted linkages, as well as other modifications known 
in the art, both naturally occurring and non-naturally oc- 
curring. 

[0020] As used interchangeably herein, the terms 
"nucleic acids", 'oligonudeotides", and "polynucle- 
otides" include RNA or DNA (either double or single 
stranded (coding or antisense), or RNA/DNA hybrid se- 
quences of more than one nucleotide in either single 
chain or duplex form (although each of the above spe- 
cies may be particularly specified). The term "nucle- 
otide" as used herein as an adjective to describe mole- 
cules comprising RNA, DNA, or RNA/DNA hybrid se- 
quences of any length in single-stranded or duplex form. 
The term "nucleotide" is also used herein as a noun to 
referto individual nucleotides or varieties of nucleotides, 
meaning a molecule, or individual unit in a larger nucleic 
acid molecule, comprising a purine or pyrimidine, a ri- 
bose or deoxyribose sugar moiety, and a phosphate 
group, or phosphodiester linkage in the case of nucle- 
otides within an oligonucleotide or polynucleotide. Al- 
though the term "nucleotide" is also used herein to en- 
compass "modified nucleotides" which comprise at least 



one modifications (a) an alternative linking group, (b) an 
analogous form of purine, (c) an analogous form of py- 
rimidine, or (d) an analogous sugar, for examples of 
analogous linking groups, purine, pyrimidines, and sug- 
s ars see for example PCT publication No. WO 95/04064. 
Preferred modifications of the present invention include, 
but are not limited to, 5-fluorouracil, 5-bromouracil, 
5-chlorouracil, 5-iodouracil, hypoxanthine, xantine, 

4- acetylcytosine, 5-(carboxy hydroxy tm ethyl) uracil, 
10 5-carboxymethylaminomethy1-2-thiouridine, 5~car- 

boxymethylaminomethyluracil, dihydrouractl, beta-D- 
galactosylqueosine, inosine, N6-isopentenytadenine, 
1 -methylguanine, 1 -methyl inosine, 2,2-dimethylgua- 
nine, 2-methyladenine, 2-methylguanine, 3-methylcyto- 
'5 sine, 5-methylcytosine, N6-adenine, 7-methylguanine, 

5- methylaminomethyl uracil, 5-mcthoxyaminomethyl- 
2-thiouracil, beta-D-mannosylqueosine, S'-methoxycar- 
boxymethyluracil, 5-methoxyuracil, 2-methylthio- 
N6-isopentenyladenine, uracil-5-oxyacetic acid (v) ybu- 

20 toxosine, pseudouracil, queosine, 2-thiocytosine, 5-me- 
thyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methylu- 
racil, uracil-5-oxyacetic acid methylester, uracil-5-oxy- 
acetic acid, 5-methyl-2-thiouracil, 3-(3-amino-3-N- 
2-carboxypropyl) uracil, and2,6-diaminopurine, Methyl- 

25 enemethylimino linked oligonucleosides as well as 
mixed backbone compounds having, may be prepared 
as described in U.S. Pat. Nos. 5,378,825; 5,386,023; 
5,489,677; 5,602,240; and 5,610,289. Formacetal and 
thiof ormacetal linked oligonucleosides may be prepared 

30 as described in U.S. Pat. Nos. 5,264,562 and 5,264,564. 
Ethylene oxide linked oligonucleosides may be pre- 
pared as described in U.S. Pat. No. 5,223,61 8. Phosph- 
inate oligonucleotides may be prepared as described in 
U.S. Pat. No. 5,508,270.. Alkyl phosphonate oligonucle- 

35 otides may be prepared as described in U.S. Pat. No. 
4,469,863. S'-Deoxy-S'-methylene phosphonate oligo- 
nucleotides may be prepared as described in U.S. Pat. 
Nos. 5,61 0,289 or 5,625,050. Phosphoramidite oligonu- 
cleotides may be prepared as described in U.S. Pat. No. 

40 5,256,775 or U.S. Pat. No. 5,366,878. Alkylphosphono- 
thioate oligonucleotides may be prepared as described 
in published PCT applications WO 94/17093 and WO 
94/02499. 3'-Deoxy-3'-amino phosphoramidate oligo- 
nucleotides may be prepared as described in U.S. Pat. 

45 No. 5,476,925. Phosphotriester oligonucleotides may 
be prepared as descnbed in U.S. Pat. No. 5,023,243. 
Borano phosphate oligonucleotides may be prepared as 
described in U.S. Pat. Nos. 5,130,302 and 5,177,198. 
(0021] The polynucleotide sequences of the invention 

so may be prepared by any known method, including syn- 
thetic, recombinant, ex vivo generation, or a combina- 
tion thereof, as well as utilizing any purification methods 
known in the art. 

[0022] In specific embodiments, the polynucleotides 
55 of the invention are at least 1 5, at least 30, at least 50, 
at least 100, at least 125, at least 500, or at least 1000 
continuous nucleotides but are less than or equal to 
300kb, 200kb, 100kb, 50kb, 10kb, 7.5kb, 5kb, 2.5kb, 
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2kb, 1.5kb, or 1kb in length. In a further embodiment, 
polynucleotides of the invention comprise a portion of 
the coding sequences, as disclosed herein, but do not 
comprise all or a portion of any intron. 
[0023] In another embodiment, the polynucleotides s 
comprising coding sequences do not contain coding se- 
quences of a genomic flanking gene (i.e., 5* or 3' to the 
gene of interest in the genome). In other embodiments, 
the polynucleotides of the invention do not contain the 
coding sequence ofmore than 1 000, 500, 250, 1 00, 75, 10 
50, 25,20, 15, 10,5,4, 3,2, or1 genomic flanking gene 
(s).The terms "base paired" and "Watson & Crick base 
paired" arc used interchangeably herein to refer to nu- 
cleotides which can be hydrogen bonded to one another 
be virtue of their sequence identities in a manner like is 
that found in double-helical DNA with thymine or uracil 
residues linked to adenine residues by two hydrogen 
bonds and cytosine and guanine residues linked by 
three hydrogen bonds (See Stryer, L, Biochemistry, 4» 
edition, 1995). 20 
[0024J The terms "complementary" or "complement 
thereof" are used herein to refer to the sequences of 
polynucleotides which is capable of forming Watson & 
Crick base pairing with another specified polynucleotide 
throughout the entirety of the complementary region. 25 
For the purpose of the present invention, a first polynu- 
cleotide is deemed to be complementary to a second 
polynucleotide when each base in the first polynucle- 
otide is paired with its complementary base. Comple- 
mentary bases are, generally, A and T (or A and U) or 30 
C and G. "Complement" is used herein as a synonym 
from "complementary polynucleotide", "complementary 
nucleic acid" and "complementary nucleotide se- 
quence". These terms are applied to pairs of polynucle- 
otides based solely upon their sequences and not any 35 
particular set of conditions under which the two polynu- 
cleotides would actually bind. Preferably, a "comple- 
mentary" sequence is a sequence which an A at each 
position where there is a T on the opposite strand, a T 
at each position where there is an A on the opposite 40 
strand, a G at each position where there is a C on the 
opposite strand and a C at each position where there is 
a G on the opposite strand. 

[0025] The terms "vertebrate nucleic acid" and "ver- 
tebrate polypeptide" are used herein to refer to any nu- 45 
cleic acid or polypeptide respectively which are derived 
from a vertebrate species including binds and more usu- 
ally mammals, preferably primates such as humans, 
farm animals such as swine, goats, sheep, donkeys, 
and horses, rabbits or rodents, more preferably rats or so 
mice. As used herein, the term "vertebrate" is used to 
refer to any vertebrate, preferably a mammal. The term 
"vertebrate expressly embraces human subjects unless 
preceded with the term "non-human". 
[0026] Thus, 5' ESTs in cDNA libraries in which one ss 
or more 5' ESTs make up 5% or more of the number of 
nucleic acid inserts in the backbone molecules are "en- 
riched recombinant 5* ESTs" as defined herein. Like- 



wise, 5' ESTs in a population of plasmids in which one 
or more 5' ESTs of the present invention have been in- 
serted such that they represent 5% or more of the 
number of inserts in the plasmid backbone arc "enriched 
recombinant 5' ESTs" as defined herein. However, 5' 
ESTs in cDNA libraries in which 5' ESTs constitute less 
than 5% of the number of nucleic acid inserts in the pop- 
ulation of backbone molecules, such as libranes in 
which backbone molecules having a 5' EST insert are 
extremely rare, are not "enriched recombinant 5' ESTs." 
[0027] The term "capable of hybndizing to the polyA 
tail of said mRNA" refers to and embraces all primers 
containing stretches of thymidine residues, so-called ol- 
igo(oT) primers, that hybridize to the 3' end of eukaryotic 
poly(A)+ mRNAs to prime the synthesis of a first cDNA 
strand. Techniques for generating said oligo(tfT) primers 
and hybridizing them to mRNA to subsequently prime 
the reverse transcnption of said hybridized mRNA to 
generate a first cDNA strand are well known to those 
skilled in the art and are described in Current Protocols 
in Molecular Biology, John Wiley and Sons, Inc. 1997 
and Sambrook et a/., Molecular Cloning. A Laboratory 
Manual, Second Edition, Cold Spring Harbor Laboratory 
Press, 1989, the entire disclosures of which are incor- 
porated herein by reference. Preferably, said oligo(oT) 
primers are present in a large excess in order to allow 
the hybridization of all mRNA3'endsto at least one oligo 
(dT) molecule. The priming and reverse transcription 
step are preferably performed between 37°C and 55 e C 
depending on the type of reverse transcriptase used. 
[0028] Preferred oligo(dT) primers for priming reverse 
transcription of mRNAs are oligonucleotides containing 
a stretch of thymidine residues of sufficient length to hy- 
bridize specifically to the polyA tail of mRNAs, preferably 
of 12 to 18 thymidine residues in length. More prefera- 
bly, such oligofT) primers comprise an additional se- 
quence upstream of the pory(oT) stretch in orderto allow 
the addition of a given sequence to the 5'end of ail first 
cDNA strands which may then be used to facilitate sub- 
sequent manipulation of the cDN A Preferably, this add- 
ed sequence is 8 to 60 residues in length. For instance, 
the addition of a restriction site in 5' of cDNAs facilitates 
subcloning of the obtained cDNA. Alternatively, such an 
added 5'end may also be used to design primers of PCR 
to specifically amplify cDNA clones of interest. 
[0029] In some embodiments, the present invention 
relates to 5' ESTs which are derived from genes encod- 
ing secreted proteins. As used herein, a "secreted" pro- 
tein is one which, when expressed in a suitable host cell, 
is transported across or through a membrane, including 
transport as a result of signal peptides in its amino acid 
sequence. "Secreted" proteins include without limitation 
proteins secreted wholly (e.g. soluble proteins), or par- 
tially (e.g. receptors) from the cell in which they are ex- 
pressed. "Secreted" proteins also include without limi- 
tation proteins which are transported across the mem- 
brane of the endoplasmic reticulum. 
[0030] Such 5' ESTs include nucleic acid sequences, 
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called signal sequences, which encode signal peptides 
which direct the extracellular secretion of the proteins 
encoded by the genes from which the 5* ESTs are de- 
rived. Generally, the signal peptides are located at the 
amino termini of secreted proteins. Polypeptides com- 
prising these signal peptides (as delineated in the se- 
quence listing), and polynucleotides encoding the 
same, are preferred embodiments of the present inven- 
tion. 

[0031 ] Secreted proteins are translated by rfoosomes 
associated with the "rough" endoplasmic reticulum. 
Generally, secreted proteins are co-transiationally 
transferred to the membrane of the endoplasmic reticu- 
lum. Association of the ribosome with the endoplasmic 
reticulum during translation of secreted proteins is me- 
diated by the signal peptide. The signal peptide is typi- 
cally cleaved following its co-translational entry into the 
endoplasmic reticulum. After delivery to the endoplas- 
mic reticulum, secreted proteins may proceed through 
the Golgi apparatus. In the Golgi apparatus, the proteins 
may undergo post-translational modification before en- 
tering secretory vesicles which transport them across 
the cell membrane. 

[0032] The 5' ESTs of the present invention have sev- 
eral important applications. For example, the 5'EST se- 
quences of the sequence listing, and fragments thereof, 
may be used to distinguish human tissues or celts from 
non-human tissues or cells and to distinguish between 
human tissues or cells that do and do not express poly- 
nucleotides comprising the 5' EST sequences of the 
present invention. By knowing the tissue expression 
pattern of the 5' EST sequences, either through routine 
experimentation or by using the Tables herein, the poly- 
nucleotides of the present invention may be used in 
methods of determining the identity of an unknown tis- 
sue or cell sample. For example, if a 5' EST is expressed 
in a particular tissue or cell type, as shown in the Tables 
below, and the unknown tissue or cell sample does not 
express the 5' EST, It may be inferred that the unknown 
tissue or cells are either not human or not the same hu- 
man tissue or ceil type as that which expresses the 5* 
EST. Conversely, if a 5* EST is not expressed in a par- 
ticular tissue or cell type, as shown in the Tables below, 
and the unknown tissue or cell sample does express the 
5* EST, it may be inferred that the unknown tissue or 
cells are either not human or not the same human tissue 
or ceil type as that which does not express the 5' EST. 
The above procedure may be used for either homoge- 
neous tissue or cell samples or heterogeneous tissue 
or cell samples since one may only want to narrow the 
identity to human or non-human or to a tissue type. Fur- 
ther assays may be used in conjunction with the above 
methods to narrow or confirm the identification process. 
These methods of determining tissue or cell identity are 
based on methods which detect the presence or ab- 
sence of the 5' EST sequences in a tissue or cell sample 
using methods well know in the art (e.g., hybridization 
or PCR methods). 



[0033] In other useful applications, fragments of the 
5* EST sequences encoding signal peptides as well as 
degenerate polynucleotides encoding the same, may be 
ligated to sequences encoding either the polypeptide 

5 from the same gene or to sequences encoding a heter- 
ologous polypeptide to facilitate secretion The 5'EST 
sequences, and fragments thereof, may also be used to 
obtain and express cDNA clones which include the full 
protein coding sequences of the corresponding gene 

10 products, including the authentic translation start sites 
derived from the 5' ends of the coding sequences of the 
mRNAs from which the & ESTs are derived. These cD- 
NAs will be referred to hereinafter as "full-length cDNAs. 
" These cON As may also include DNA derived from mR- 

15 NA sequences upstream of the translation start site. The 
full-length cDNA sequences may be used to express the 
proteins corresponding to the 5' ESTs. As discussed 
above, secreted proteins and non-secreted proteins 
may be therapeutically important. Thus, the proteins ex- 

20 pressed from the cDNAs may be useful in treating or 
controlling a variety of human conditions. The 5' ESTs 
may also be used to obtain the corresponding genomic 
DNA. The term "corresponding genomic DNA" refers to 
the genomic DNA which encodes the mRN A from which 

25 the 5' EST was derived. 

[0034] Another use of the polynucleotides of the 
present invention is to map and clone promoter regions 
and open reading frames from a genomic sequence. For 
example, the 5' ESTs can be used in combination with 

30 the sequence information from genome sequencing 
projects, such as the U.S. Human Genome Project or 
other public and private genome sequencing projects, 
to map and clone regions of the genome that comprise 
promoters and expressed open reading frames. The 

35 polynucleotides of the present invention are particularly 
useful for mapping and identifying coding regions (re- 
gions containing expressed open reading frames) from 
a genomic sequence since the vast majority of the hu- 
man genome does not encode expressed genes and 

40 because of the difficulty in identifying authentic open 
reading frames (open reading frames that encode ex- 
pressed genes). The 5* EST sequences of the present 
invention can be used in conjunction with various algo- 
rithms to identify promoter or entire ORF sequences. 

45 [0035] Alternatively, the 5* ESTs may be used to ob- 
tain and express extended cDNAs encoding portions of 
the protein. In the case of secreted proteins, the portions 
may comprise the signal peptides of the secreted pro- 
teins or the mature proteins generated when the signal 

so peptide is cleaved off. 

[0036] The present invention includes isolated, puri- 
fied, or enriched "EST-related nucleic acids." The terms 
"isolated", "purified" or "enriched" have the meanings 
provided above. As used herein, the term "EST-related 

55 nucleic acids" means the nucleic acids of SEQ ID NOs. 
24-3883 and 7744-19335, extended cDNAs obtainable 
using the nucleic acids of SEQ ID NOs. 24-3883 and 
7744-1 9335. full-length cDNAs obtainable using the nu- 
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clelc adds of SEQ ID NOs. 24-3883 and 7744-1 9335 or 
genomic DNAs obtainable using the nucleic acids of 
SEQ ID NOs. 24-3883 and 7744-1 9335. The present In- 
vention also includes the sequences complementary to, 
or allelic variants of, the EST-related nucleic adds. 
[0037] The present invention also includes isolated, 
purified, or enriched "fragments of EST-related nudeic 
acids." The terms "isolated", "purified" and "enriched" 
have the meanings described above. As used herein the 
term "fragments of EST-related nudeic acids" means 
fragments comprising at least 8, 10, 12, 15, 18, 20, 23, 
25, 28, 30, 35, 40, 50, 75, 100, 200, 100, 500, or 1000 
consecutive nucleotides of the EST-related nudeic ac- 
ids to the extent that fragments of these lengths are con- 
sistent with the lengths of the particular EST-related nu- 
cleic adds being referred to. The present invention also 
indudes the sequences complementary to the frag- 
ments of the EST-related nudeic adds. In particular, 
fragments of EST-related nucleic acids refer to the poly- 
nudeotides described in Tables IVa and IVb, and poly- 
nucleotides described in Tables IVa and IVb updated as 
defined below. 

[0038] The present invention also includes isolated, 
purified, or enriched "positional segments of EST-relat- 
ed nudeic adds." The terms "isolated", "purified", or 
"enriched" have the meanings provided above. As used 
herein, the term "positional segments of EST-related nu- 
cleic acids" includes segments comprising nudeotides 
1-25, 26-50, 51-75, 76-100, 101-125, 126-150, 
151-175, 176-200, 201-225, 226-250, 251-300,' 
301-325, 326-350, 351-375, 376-400, 401-425,' 
426-450, 451^75, 476-500, 501-525,' 
526-550,551 -575, 576-600 and 601 -the terminal nuclei 
otide of the EST-related nucleic acids to the extent that 
such nucleotide positions are consistent with the lengths - 
of the particular EST-related nudeic acids being re- 
ferred to, and wherein position "1" is defined as the 5' 
most position defined in the sequence listing or Tables 
below. The term "positional segments of EST-related 
nucleic acids also indudes segments comprising nude- < 
otides 1-50, 51-100, 101-150, 151-200, 201-750 
251-300, 301-350, 351-400, 401-450, 450-500,' 
501 -550, 551 -600 or 601 -the terminal nucleotide of the 
EST-related nucleic acids to the extent that such nucle- 
otide positions are consistent with the lengths of the par- 4 
ticular EST-related nucleic acids being referred to. The 
term "positional segments of EST-related nucleic adds" 
also includes segments comprising nucleotides 1-100 
101-200, 201-300, 301-400, 501-500, 500-600, or 
601 -the terminal nucleotide of the EST-related nucleic a 
acids to the extent that such nucleotide positions are 
consistent with the lengths of the particular EST-related 
nucleic adds being referred to. In addition, the term "po- 
sitional segments of EST-related nudeic adds" includes 
segments comprising nucleotides 1-200, 201-400, 5t 
400-600, or 601 -the terminal nucleotide of the EST-re^ 
lated nucleic acids to the extent that such nucleotide po- 
sitions arc consistent with the lengths of the particular 



EST related nudeic acids being referred to. The present 
invention also indudes the sequences complementary 
to the positional segments of EST-related nudeic acids. 
[0039] The present invention also indudes isolated, 
5 purified, or enriched fragments of positional segments 
of EST-related nucleic adds." The terms "isolated", "pu- 
rified", or "enriched" have the meanings provided above. 
As used herein, the term fragments of positional seg- 
ments of EST-related nudeic adds" refers to fragments 
10 comprising at Ieast8, 10, 15, 18, 20, 23, 25, 28, 30, 35, 
40, 50, 75, 100, 150, or 200 consecutive nudeotides 
ofthe positional segments of EST-related nucleic acids. 
The present invention also includes the sequences 
complementary to the fragments of positional segments 
» of EST-related nucleic adds . 

[0040] In addition to the above 'positional segments 
of EST-related nudeic acids" and fragments of posi- 
tional segments of EST-related nudeic adds", for the 
nucleic acids of SEQ ID NOs. 24-3883 and 7744-1 9335, 
20 to**™ preferred nucleic adds comprise at least 8 nu- 
deotides, wherein "at least 8" is defined as any integer 
between 8 and the integer representing the 3' most nu- 
cleotide position in the sequence listing or Tables below. 
Further induded are nudeic add fragments at least 8 
& nucleotides in length, as described above, that are fur- 
ther specified in terms of their 5' and 3' position. The 5' 
and 3' positions are represented by the position number 
set forth in the sequence listing below. Therefore, every 
combination of a 5' and 3' nucleotide position that a frag- 
so ment at least 6 contiguous nudeotides in length could 
occupy is induded in the invention as an individual spe- 
cies. The polynucleotide fragment specified by 5' and 3* 
positions can be immediately envisaged and are there- 
fore not individually listed solely for the purpose of not 
» unnecessarily lengthening the specifications. It is noted 
that the above species of polynucleotides fragments of 
the present invention may alternatively be described by 
the formula 'a to b"; where "a" equals the 5" nudeotide 
position and "b" equals 3 " nudeotide position of the 
» polynucleotide fragment; and further where "a" equals 
an integer between 1 and the number of nucleotides of 
the polynucleotide sequence of the present invention 
minus 8, and where "b" equals an integer between 9 and 
the number of nucleotides of the polynudeotide se- 
5 quence of the present invention; and where "a" is an in- 
teger smaller then "b" by at least 8. 
[0041 ] The present invention also provides for the ex- 
clusion of any polynucleotide fragments specified by 5* 
and 3* positions or by size in nucleotides as descnbed 
> above. Any number of fragments specified by 5' and 3* 
positions or by size in nucleotides, as described above, 
may be excluded. 

[0042] The present invention also indudes isolated or 
purified "EST-related polypeptides." The terms "isolat- 
; ed" or unpurified" have the meanings provided above. 
As used herein, the term "EST-related polypeptides" 
means the polypeptides encoded by the EST-related 
nucleic acids, including the polypeptides of SEQ ID 
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NOs. 3884-7743. 

[0043] The present invention also includes isolated or 
purified fragments of EST-related polypeptides." The 
terms "isolated" or "purified" have the meanings provid- 
ed above. As used herein, the term "fragments of EST- 
related polypeptides" means fragments compnsing at 
least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 
consecutive amino acids of an EST-related polypeptide 
to the extent that fragments of these lengths are con- 
sistent with the lengths of the particular EST-related 
polypeptides being referred to. In particular, fragments 
of EST-related polypeptides refer to polypeptides en- 
coded by polynucleotides described in Tables IVa and 
IVb, and polynucleotides described in Tables IVa and 
lVb updated. 

[0044] The present invention also includes isolated or 
purified "positional segments of EST-related polypep- 
tides " As used herein, the term "positional segments of 
EST-related polypeptides" includes polypeptides com- 
prising amino acid residues 1 -25, 26-50, 51 -75, 76-1 00, 
101-125, 126-150, 151-175, 176-200, or 201 -the C-ter- 
minal amino acid of the EST-related polypeptides to the 
extent that such amino acid residues are consistent with 
the lengths of the particular EST-related polypeptides 
being referred to. The term "positional segments of EST- 
related polypeptides also includes segments compris- 
ing amino acid residues 1 -50, 51 -1 00, 1 01 -1 50, 1 51 -200 
or 201 -the C-terminal amino acid of the EST-related 
polypeptides to the extent that such amino acid residues 
are consistent with the lengths of the particular EST-re- 
lated polypeptides being referred to. The term "position- 
al segments of EST-related polypeptides" also includes 
segments comprising amino acids 1-100 or 101-200 of 
the EST-related polypeptides to the extent that such 
amino acid residues are consistent with the lengths of 
particular EST-related polypeptides being referred to. In 
addition, the term "positional segments of EST-related 
polypeptides" includes segments comprising amino ac- 
id residues 1-200 or 201 -the C-terminal amino acid of 
the EST-related polypeptides to the extent that amino 
acid residues are consistent with the lengths of the par- 
ticular EST related polypeptides being referred to. 
[0045] The present invention also includes isolated or 
purified "fragments of positional segments of EST-relat- 
ed polypeptides." The terms "isolated" or "purified" have 
the meanings provided above. As used herein, the term 
"fragments of positional segments of EST-related 
polypeptides" means fragments comprising at least 5, 
10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecu- 
tive amino acids of positional segments of EST-related 
polypeptides to the extent that fragments of these 
lengths are consistent with the lengths of the particular 
EST-related polypeptides being referred to. 
[0046] In addition to the above "positional segments 
of EST-related polypeptides" and "fragments of posi- 
tional segments of EST-related polypeptides", for the 
polypeptides of the present invention, further preferred 
polypeptides comprise at least 8 amino acids, wherein 



"at least 8" is def ined as any integer between 8 and the 
integer representing the C-terminal amino acid of the 
polypeptide of the present invention including the 
polypeptide sequences of the sequence listing below. 

s Further included are polypeptide fragments at least 8 
amino acids in length, as described above, that are fur- 
ther specified in terms of their N-terminal and C-terminal 
positions. Preferred polypeptide fragment species spec- 
ified by their N-terminal and C-terminal positions include 

10 the signal peptides delineated in the sequence listing 
below. However, included in the present invention as in- 
dividual species are all polypeptide fragments, at least 
5 amino acids in length, as described above, and may 
be particularly specified by a N-terminal and C-terminal 

15 position. 

[0047] The present invention also provides for the ex- 
clusion of any fragments specified by N-terminal and C- 
termtnal positions or by size in amino acid residues as 
described above. Any number of fragment species 
20 specified by N-terminal and C-terminal positions or sub- 
genus of fragments specified by size in amino acid res- 
idues as described above may be excluded from the 
present invention. 

[0048] The polypeptide fragments of the present in- 

25 vention can be immediately envisaged using the above 
description and are therefore not individually listed sole- 
ly for the purpose of not unnecessarily lengthening the 
specification. The above fragments need not be active 
since they would be useful, for example, immu- 

30 noassays, in epitope mapping, epitope tagging, as vac- 
cines, to raise antibodies, stimulate an immune re- 
sponse in a heterologous species, and as molecular 
weight markers. The above fragments may also be used 
to generate antibodies to a particular portion of the 

35 polypeptide. These antibodies can then be used in im- 
munoassays well known in the art to distinguish be- 
tween human and non-human cells and tissues or to de- 
termine whether ceils or tissues in a biological sample 
are or are not of the same type which express the 

ao polypeptide of the present invention. Further preferred 
polypeptide fragments of the present invention com- 
prise the signal peptides as delineated in the sequence 
listing. These signal peptides may be used to facilitate 
secretion of either the polypeptide of the same gene or 

45 a heterologous polypeptide. 

[0049] The present invention also includes antibodies 
which specifically recognize the EST-related polypep- 
tides, fragments of EST-related polypeptides, positional 
segments of EST-related polypeptides, or fragments of 

so positional segments of EST-related polypeptides. In the 
caseof secreted proteins, such as those of SEQ ID NOs. 
51 99-591 9 antibodies which specifically recognize the 
mature protein generated when the signal peptide is 
cleaved may also be obtained as described below. Sim- 

55 Harty, antibodies which specifically recognize the signal 
peptides of SEQ ID NOs. 3884-4243 or 51 99-591 9 may 
also be obtained. 

[0050] In some embodiments and in the case of se- 
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creted proteins, the EST-related nucleic acids, frag- 
ments of EST-related nucleic acids, positional segments 
of EST-related nucleic acids, or fragments of positional 
segments of nucleic acids include a signal sequence. In 
other embodiments, the EST-related nucleic acids, frag- 
ments of EST-related nucleic acids, positional segments 
of EST-related nucleic acids, or fragments of positional 
segments of nucleic acids may include the full coding 
sequence for the protein or, in the case of secreted pro- 
teins, the full coding sequence of the mature protein (i. 
e. the protein generated when the signal polypeptide is 
cleaved off). In addition, the EST-related nucleic acids, 
fragments of EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids, or fragments of po- 
sitional segments of nucleic acids may include regula- 
tory regions upstream of the translation start site or 
downstream of the stop codon which control the 
amount, location, or developmental stage of gene ex- 
pression. 

[0051J As discussed above, both secreted and non- 
secreted human proteins may be therapeutically impor- 
tant. Thus, the proteins expressed from the EST-related 
nucleic acids, fragments of EST-related nucleic acids, 
positional segments of EST-related nucleic acids, or 
fragments of positional segments of nucleic acids may 
be useful in treating or controlling a variety of human 
conditions. 

[0052] The EST-related nucleic acids, fragments of 
EST-rclated nucleic acids, positional segments of EST- 
related nucleic acids, or fragments of positional seg- 
ments of nucleic acids may be used in forensic proce- 
dures to identify individuals or in diagnostic procedures 
to identify individuals having genetic diseases resulting 
from abnormal gene expression. In addition, the EST- 
related nucleic acids, fragments of EST-related nucleic 
acids, positional segments of EST-related nucleic acids, 
or fragments of positional segments of nucleic acids are 
useful for constructing a high resolution map of the hu- 
man chromosomes. 

[0053] The present invention also relates to secretion - 
vectors capable of directing the secretion of a protein of 
interest. Such vectors may be used in gene therapy 
strategies in which it is desired to produce a gene prod- 
uct in one cell which is to be delivered to another location 
in the body. Secretion vectors may also facilitate the « 
punfication of desired proteins. The secretion vectors 
may also be used to express a desired protein, such as 
a heterologous protein, such that the protein is secreted 
into the culture medium, thereby facilitating purification. 
[0054] The present invention also relates to expres- 5 
sion vectors capable of directing the expression of an 
inserted gene in a desired spatial or temporal manner 
or at a desired level. Such vectors may include sequenc- 
es upstream of the EST-related nucleic acids, fragments 
of EST-related nucleic acids, positional segments of a 
EST-related nucleic acids, or fragments of positional 
segments of nucleic acids, such as promoters or up- 
stream regulatory sequences. Preferred chimeric 



polypeptides, and vectors encoding the same, comprise 
a signal peptide set forth in the sequence listing below. 
[0055] The present invention also comprises fusion 
vectors for making chimeric polypeptides comprising a 
5 first polypeptide and a second polypeptide. Such vec- 
tors are useful for determining the cellular localization 
of the chimeric polypeptides or for isolating, purifying or 
enriching the chimeric polypeptides. 
[0056] The EST-related nucleic acids, fragments of 
10 EST-related nucleic acids, positional segments of EST- 
related nucleic acids, or fragments of positional seg- 
ments of nucleic acids may also be used for gene ther- 
apy to control or treat genetic diseases. In the case of 
secreted proteins, signal peptides may be fused to het- 
« erologous proteins to direct their extracellular secretion. 
[0057] Bacterial clones containing Bluescript plas- 
mids having inserts containing the sequence of the non- 
aligned 5'ESTs, also referred to as singletons, and se- 
quences of the 5'ESTs which were aligned to yield con- 
20 sensus contigated 5' ESTs are presently stored at -80*C 
in 4% (v/v) glycerol in the inventor's laboratories under 
internal designations The non-aligned 5'ESTs of the in- 
vention are those sequences which are present in the 
sequence listing but which identification number either 
25 corresponds to a single EST from a single tissue in the 
second column of Table V or is absent from the first col- 
umn of Table V. The inserts may be recovered from the 
stored materials by growing the appropriate clones on 
a suitable medium. The Bluescript DNA can then be iso- 
30 lated using plasmid isolation procedures familiar to 
those skilled in the art such as alkaline lysis minipreps 
or large scale alkaline lysis plasmid isolation proce- 
dures. If desired the plasmid DNA may be further en- 
riched by centrifugation on a cesium chloride gradient, 
*s size exclusion chromatography, or anion exchange 
chromatography. The plasmid DNA obtained using 
these procedures may then be manipulated using 
standard cloning techniques familiar to those skilled in 
the art. Alternatively, a PCR can be performed with prim- 
« ers designed at both ends of the inserted EST-related 
nucleic acids, fragments of EST-related nucleic acids, 
positional segments of EST-related nucleic acids, or 
fragments of positional segments of nucleic acids. The 
PCR product which corresponds to the EST-related nu- 
5 cleic acids, fragments of EST-related nucleic acids, po- 
sitional segments of EST-related nucleic acids, or frag- 
ments of positional segments of nucleic acids can then 
be manipulated using standard cloning techniques fa- 
miliar to those skilled in the art. 
> [0058] One embodiment of the present invention is a 
purified nucleic acid comprising a sequence selected 
from the group consisting of SEQ ID NOs. 24-3883 and 
SEQ ID Nos. 7744-19335 and sequences complemen- 
tary to the sequences of SEQ ID NOs. 24-3883 and SEQ 
5 ID NOs. 7744-1 9335. 

[0059] Another embodiment of the present invention 
is a purified nucleic acid comprising at least 8, 10, 12, 
15, 18, 20, 23, 25. 28, 30, 35, 40, 50, 75, 100, 200, 300,' 
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500, or 1000 consecutive nucleotides, to the extent that 
fragments of these lengths are consistent with the spe- 
cific sequence, of a sequence selected from the group 
consisting of SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335 and sequences complementary to the se- 
quences of SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335. 

[0060] A further aspect of this embodiment is a puri- 
fied vertebrate nucleic acid comprising at least 8, 1 0, 12, 
15, 18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 200, 300, 
500, or 1000 consecutive nucleotides, to the extent that 
fragments of these lengths are consistent with the spe- 
cific sequence, of a sequence selected from the group 
consisting of SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335 and sequences complementary to the se- 
quences of SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335. 

[0061 ] A further aspect of this embodiment is a puri- 
fied human nucleic acid comprising at least 8, 10, 12, 
15, 18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 200, 300, 
500, or 1000 consecutive nucleotides, to the extent that 
fragments of these lengths are consistent with the spe- 
cific sequence, of a sequence selected from the group 
consisting of SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335 and sequences complementary to the se- 
quences of SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335. 

[0062] Another embodiment of the present invention 
is a purified nucleic acid comprising at least 8, 10, 12, 
15, 18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 200, 300, 
500, or 1 000 consecutive nucleotides, to the extent that 
fragments of these lengths are consistent with the spe- 
cific sequence, of a sequence selected from the group 
consisting of the preferred polynucleotides described in 
Tables IVa and IVb and sequences complementary to 
the sequences the preferred polynucleotides described 
in Tables IVa and IVb. 

[0063] Another embodiment of the present invention 
is a purified nucleic acid comprising at least 8, 10, 12, 
15, 18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 200, 300, 
500, or 1 000 consecutive nucleotides, to the extent that 
fragments of these lengths are consistent with the spe- 
cific sequence, of a sequence selected from the group 
consisting of the preferred polynucleotides described in 
Tables IVa and IVb updated and sequences comple- 
mentary to the sequences the preferred polynucleotides 
described in Tables IVa and IVb updated. 
[0064] Another embodiment of the present invention 
is a purified nucleic acid comprising at least 1 5 consec- 
utive nucleotides of a sequence selected from the group 
consisting of SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335 and sequences complementary to the se- 
quences of SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335. 

[0065] A further aspect of this embodiment is a puri- 
fied vertebrate nucleic acid comprising at least 15 con- 
secutive nucleotides of a sequence selected from the 
group consisting of SEQ ID NOs. 24-3883 and SEQ ID 



NOs. 7744-19335 and sequences complementary to 
the sequences of SEQ ID NOs. 24-3883 and SEQ (D 
NOs. 7744-19335. 

[0066] A further aspect of this embodiment is a puri- 
5 tied human nucleic acid comprising at least 1 5 consec- 
utive nucleotides of a sequence selected from the group 
consisting of SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335 and sequences complementary to the se- 
quences of SEQ ID NOs. 24-3 883 and SEQ ID NOs. 
10 7744-19335. 

[0067] Another embodiment of the present invention 
is a purified nucleic acid comprising at least 15 consec- 
utive nucleotides of a sequence selected from the group 
consisting of the preferred polynucleotides described in 
15 Tables IVa and IVb and sequences complementary to 
the sequences of the preferred polynucleotides desc- 
nbed in Tables IVa and IVb. 

[0068] A further embodiment of the present invention 
is a purified nucleic acid comprising the coding se- 
20 quence of a sequence selected from the group consist- 
ing of 24-3883. 

[0069] Yet another embodiment of the present inven- 
tion is a purified nucleic acid comprising the full coding 
sequences of a sequence selected from the group con- 
25 sisting of SEQ ID NOs. 1 339-2059 wherein the full cod- 
ing sequence comprises the sequence encoding the 
signal peptide and the sequence encoding the mature 
protein. 

[0070] Still another embodiment of the present inven- 
30 tion is a purified nucleic acid comprising a contiguous 
span of a sequence selected from the group consisting 
of SEQ ID NOs. 1339-2059 which encodes the mature 
protein. 

[0071] Another embodiment of the present invention 
35 is a purified nucleic acid comprising a contiguous span 
of a sequence selected from the group consisting of 
SEQ ID NOs. 24-3 83 and 1339-2059 which encodes 
the signal peptide. 

[0072] Another embodiment of the present invention 
40 is a purified nucleic acid encoding a polypeptide com- 
prising a sequence selected from the group consisting 
of the sequences of SEQ ID NOs. 3884-7743. 
[0073] Another embodiment of the present invention 
is a purified nucleic acid encoding a polypeptide com- 
45 prising a sequence selected from the group consisting 
of the sequences of SEQ ID NOs. 5199-5919. 
[0074] Another embodiment of the present invention 
is a purified nucleic acid encoding a polypeptide com- 
prising a mature protein included in a sequence selected 
50 from the group consisting of the sequences of SEQ ID 
NOs. 5199-5919. 

[0075] Another embodiment of the present invention 
is a purified nucleic acid encoding a polypeptide com- 
prising a signal peptide included in a sequence selected 
55 from the group consisting of the sequences of SEQ ID 
NOs. 3884-4243 and 51 99-591 9. 
[0076] Another embodiment of the present invention 
is a purified nucleic acid of at least 15, 18, 20, 23, 25, 
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28, 30, 35, 40, 50, 75, 100, 200, 300, 500 or 1000 nu- 
cleotides in length which hybridizes under stringent con- 
ditions to a sequence selected from the group consisting 
of SEQ ID NOs. 24-3883 and SEQ ID NOs. 7744-1 9335 
and sequences complementary to the sequences of 
SEQ ID NOs. 24-3 883 and SEQ ID NOs. 7744-19335. 
[00771 Another embodiment of the present invention 
is a vertebrate purified nucleic acid of at least 1 5, 1 8, 20, 
23, 25, 28, 30, 35, 40, 50, 75, 1 00, 200, 300, 500 or 1 000 
nucleotides in length which hybridizes under stringent 
conditions to a sequence selected from the group con- 
sisting of SEQ ID NOs. 24-3883 and SEQ ID NOs 
7744-19335 and sequences complementary to the se- 
quences of SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335. 

[0078] Another embodiment of the present invention 
is a human purified nucleic acid of at least 15,18, 20. 23, 
25, 28, 30, 35, 40, 50, 75, 100, 200, 300, 500 or 1000 
nucleotides in length which hybridizes under stringent 
conditions to a sequence selected from the group con- 
sisting of SEQ ID NOs. 24-3883 and SEQ ID NOs 
7744-19335 and sequences complementary to the se- 
quences of SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335. 

[0079] Another embodiment of the present invention 
is a purified or isolated polypeptide comprising a se- 
quence selected from the group consisting of the se- 
quences of SEQ ID Nos. 3884-7743. 
[0080] Another embodiment of the present invention 
is a punfied or isolated polypeptide comprising a se- 
quence selected from the group consisting of SEQ ID 
NOs. 5199-5919. 

[0081] Another embodiment of the present invention 
is a purified or isolated polypeptide comprising a mature 
protein of a polypeptide selected from the group con- 
sisting of SEQ ID NOs. 51 99-591 9. 
[0082] Another embodiment of the present invention 
is a purified or isolated polypeptide comprising a signal 
peptide of a sequence selected from the group consist- 
ing of the polypeptides of SEQ ID NOs. 38844243 and 
5199-5919. 

Another embodiment of the present invention is a puri- 
fied or isolated polypeptide compnsing at least 5, 8, 1 0, 
12, 15, 18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 200,' 
300, 500, or 1 000 consecutive amino acids, to the extent 
that fragments of these lengths are consistent with the 
specific sequence, of a sequence selected from the 
group consisting of the sequences of SEQ ID NOs 
3884-7743. 

[0083] Another embodiment of the present invention 
is a method of making a cDNA comprising the steps of 
contacting a collection of mRNA molecules from human 
cells with a primer comprising at least 12, 15, 1 8, 20, 23, 
25, 28, 30, 35, 40, or 50 consecutive nucleotides of a 
sequence selected from the group consisting of the se- i 
quences complementary to SEQ ID NOs. 24-3883 and 
SEQ ID NOs. 7744-19335, hybridizing said primerto an 
mRNA in said collection that encodes said protein, re- 



verse transcribing said hybridized primerto make a first 
cDNA strand from said mRNA, making a second cDNA 
strand complementary to said first cDNA strand and iso- 
lating the resultingcDNAencodingsaidprotein compris- 

5 ing said first cDNA strand and said second cDN A strand. 
[0084] Another embodiment of the present invention 
is a purified cDNA obtainable by the method of the pre- 
ceding paragraph. In one aspect of this embodiment, the 
cDNA encodes at least a portion of a human polypep- 

10 tide. 

[0085] Another embodiment of the present invention 
is a punfied cDNA obtained by a method of making a 
cDNA of the invention. In one aspect of this embodi- 
ment, the cDNA encodes at least a portion of a human 

is polypeptide. 

[0086] Another embodiment of the present invention 
is a method of making a cDNA comprising the steps of 
contacting a cDNA collection with a detectable probe 
comprising at least 12, 15, 18, 20, 23, 25, 28, 30, 35, 

20 40, or 50 consecutive nucleotides of a sequence select- 
ed from the group consisting of SEQ ID NOs. 24-3883 
and SEQ ID NOs. 7744-1 9335 and the sequences com- 
plementary to SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335 under conditions which permit said probe 

25 to hybridize to cDNA, identifying a cDNA which hybrid- 
izes to said detectable probe, and isolating said cDNA 
which hybridizes to said probe. 
[0087] Another embodiment of the present invention 
is a purified cDN A obtainable by the method of the pre- 

30 ceding paragraph. In one aspect of this embodiment, the 
cDNA encodes at least a portion of a human polypep- 
tide. 

[0088] Another embodiment of the present invention 
is a method of making a cDNA comprising the steps of 
35 contacting a collection of mRNA molecules from human 
cells with a first primer capable of hybridizing to the 
polyA tail of said mRNA, hybridizing said first primer to 
said polyA tail, reverse transcribing said mRNA to make 
a first cDN A strand, making a second cDNA strand com- 
40 plementary to said first cDNA strand using at least one 
primer comprising at least 12, 15, 18, 20, 23, 25, 28, 30, 
35, 40, or 50 consecutive nucleotides of a sequence se- 
lected from the group consisting of SEQ ID NOs. 24-3 
883 and SEQ ID NOs. 7744-19335, and isolating the 
45 resulting cDNA comprising said first cDNA strand and 
said second cDNA strand. In another aspect of this 
method the second cDN A strand is made by contacting 
said first cDNA strand with a second primer comprising 
at least 1 2, 1 5, 1 8, 20, 23, 25, 28. 30, 35, 40, or 50 con- 
so secutive nucleotides of a sequence selected from the 
group consisting of SEQ ID NOs. 24-3883 and SEQ ID 
NOs. 7744-19335 and a third primer which sequence is 
fully included within the sequence of said first primer, 
performing a first polymerase chain reaction with said 
*5 second and third primers to generate a first PCR prod- 
uct, contacting said first PCR product with a fourth prim- 
er, comprising at least 12, 15,18, 20, 23, 25, 28, 30, 35, 
40, or 50 consecutive nucleotides of said sequence se- 
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lected from ttie group consisting of SEQ ID NOs. 
24-3883 and SEQ ID NOs. 7744-1 9335, and a fifth prim- 
er, which sequence is fully included within the sequence 
of said third primer, wherein said fourth and fifth hybrid- 
ize to sequences within said first PCR product, and per- 
forming a second polymerase chain reaction, thereby 
generating a second PCR product. Alternatively, the 
second cDNA strand may be made by contacting said 
first cDNA strand with a second primer comprising at 
least 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, or 50 con- 
secutive nucleotides of a sequence selected from the 
group consisting of SEQ ID NOs. 24-3883 and SEQ ID 
NOs. 7744-1 9335 and a third primer which sequence is 
fully included within the sequence of said first primer, 
performing a polymerase chain reaction with said sec- 
ond and third primers to generate said second cDNA 
strand. Alternatively, the second cDNA strand may be 
made by contacting said first cDN A strand with a second 
primer comprising at least 1 2, 1 5, 1 8, 20, 23, 25, 28, 30, 
35, 40, or 50 consecutive nucleotides of a sequence se- 
lected from the group consisting of SEQ ID NOs. 
24-3883 and SEQ ID NOs. 7744-1 9335, hybridizing said 
second primer to said first strand cDNA, and extending 
said hybridized second primer to generate said second 
cDNA strand. 

[0089] Another embodiment of the present invention 
is a purified cDNA obtainable by a method of making a 
cDNA of the invention. In one aspect of this embodi- 
ment, said cDNA encodes at least a portion of a human 
polypeptide. 

Another embodiment of the present invention is a meth- 
od of making a polypeptide comprising the steps of ob- 
taining a cDNA which encodes a polypeptide encoded 
by a nucleic acid comprising a sequence selected from 
the group consisting of SEQ ID NOs. 24-3883 or a cDNA 
which encodes a polypeptide comprising at least 6, 8. 
1 0, 1 2, 1 5, 1 8, 20, 23, 25, 28, 30, 35, 40, or 50 consec- 
utive amino acids of a polypeptide encoded by a se- 
quence selected from the group consisting of SEQ ID 
NOs. 24-3883, inserting said cDNA in an expression 
vector such that said cDNA is operably linked to a pro- 
moter, introducing said expression vector into a host cell 
whereby said host cell produces the protein encoded by 
said cDNA, and isolating said protein. 
[0090] Another embodiment of the present invention 
is a method of obtaining a promoter DN A comprising the 
steps of obtaining genomic DNA located upstream of a 
nucleic acid comprising a sequence selected from the 
group consisting of SEQ ID NOs. 24-3883 and SEQ ID 
NOs. 7744-19335 and the sequences complementary 
to the sequences of SEQ ID NOs. 24-3883 and SEQ ID 
NOs. 7744-19335, screening said genomic DNA to 
identify a promoter capable of directing transcription in- 
itiation, and isolating said DNA comprising said identi- 
fied promoter 

[0091] In one aspect of this embodiment, said obtain- 
ing step comprises walking from genomic DNA compris- 
ing a sequence selected from the group consisting of 



SEQ ID NOs. 24-3 883 and SEQ ID NOs. 7744-19335 
and the sequences complementary to SEQ ID NOs. 
24-3 883 and SEQ ID NOs. 7744-19335. In another as- 
pect of this embodiment, said screening step comprises 

5 inserting genomic DNA located upstream of a sequence 
selected from the group consisting of SEQ ID NOs. 24-3 
883 and SEQ ID NOs. 7744-19335 and the sequences 
complementary to SEQ ID NOs. 24-3883 and SEQ ID 
NOs. 7744-1 9335 into a promoter reporter vector. For 

10 example, said screening step may comprise identifying 
motifs in genomic DNA located upstream of a sequence 
selected from the group consisting of SEQ ID NOs. 
24-3883 and SEQ ID NOs. 7744-19335 and the se- 
quences complementary to SEQ ID NOs. 24-3 883 and 

15 SEQ ID NOs. 7744-1 9335 which are transcription factor 
binding sites or transcription start sites. 
[0092] Another embodiment of the present invention 
is a isolated promoter obtainable by the methods of the 
above paragraphs. 

20 [0093] Another embodiment of the present invention 
is a isolated promoter obtained by the methods de- 
scribed in the above paragraphs. 
[0094] Another embodiment of the present invention 
is the inclusion of at least one sequence selected from 

25 the group consisting of SEQ ID NOs. 24-3883 and SEQ 
ID NOs. 7744-1 9335, the sequences complementary to 
the sequences of SEQ ID NOs. 24-3883 and SEQ ID 
NOs. 7744-1 9335 and fragments comprising at least 1 2, 
15,18, 20, 23, 25, 28, 30, 35, 40, 50, or 1 00 consecutive 

30 nucleotides of said sequence in an array of discrete 
ESTs or fragments thereof of at least 12,15,18, 20, 23, 
25, 28, 30, 35, 40, 50, or 100 nucleotides in length. In 
some aspects of this embodiment, the array includes at 
least two sequences selected from the group consisting 

35 of SEQ ID NOs. 24-3883 and SEQ ID NOs. 7744-19335, 
the sequences complementary to the sequences of 
SEQ ID NOs. 24-3883 and SEQ ID NOs. 7744-19335, 
and fragments comprising at least 12, 15, 18, 20, 23, 
25, 28, 30, 35, 40, 50, or 100 consecutive nucleotides 

40 of said sequences, in another aspect of this embodi- 
ment, the array includes at least one, three, five, ten, 
fifteen, or twenty sequences selected from the group 
consisting of SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335, the sequences complementary to the se- 

45 quences of SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335 and fragments comprising at least 12, 15, 
18, 20, 23, 25, 28, 30, 35, 40, 50, or 100 consecutive 
nucleotides of said sequences. 
[0095] Another embodiment of the present invention 

so is an enriched population of recombinant nucleic acids, 
said recombinant nucleic acids comprising an insert nu- 
cleic acid and a backbone nucleic acid, wherein at least 
0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 5%, 10%, or 20% 
of said insert nucleic acids in said population comprise 

55 a sequence selected from the group consisting of SEQ 
ID NOs. 24-3883 and SEQ ID NOs. 7744-1 9335 andthe 
sequences complementary to SEQ ID NOs 24-3883 and 
SEQ ID NOs. 7744-19335. 
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[0096] Another embodiment of the present Invention 
is a purified or isolated antibody capable of specifically 
binding to a polypeptide comprising a sequence select- 
ed from the group consisting of SEQ ID NOs 
3884-7743. 

[0097] Another embodiment of the present invention 
is a purified or isolated antibody capable of specifically 
binding to a polypeptide comprising at least 6, 8, 10 r 12, 
1 5, 1 8, 20, 23, 25, 28, 30, 35, 40, or 50 consecutive ami- 
no acids of a sequence selected from the group consist- 
ing of SEQ ID NOs. 3884-7743. 
[0098] Yet, another embodiment of the present inven- 
tion is an antibody composition capable of selectively 
binding to an epitope-containing fragment of a polypep- 
tide comprising a contiguous span of at least 8, 10, 12, 
15, 18, 20, 23, 25, 28, 30, 35, 40, or 50 amino acids of 
any of SEQ ID NOs. 3884-7743, wherein said antibody 
is polyclonal or monoclonal. 

[0099] Another embodiment of the present invention 
is a computer readable medium having stored thereon 
a sequence selected from the group consisting of a nu- 
cleic acid code of SEQ ID NOs. 24-3 883 and 
7744-19335 and a polypeptide code of SEQ ID NOs 
3884-7743. 

[01 00] Another embodiment of the present invention 
is a computer system comprising a processor and a data 
storage device wherein said data storage device has 
stored thereon a sequence selected from the group con- 
sisting of a nucleic acid code of SEQ ID NOs. 24-3883 
and 7744-19335 and a polypeptide code of SEQ ID 
NOs. 3884-7743. In one aspect of this embodiment the 
computer system further comprises a sequence com- 
parer and a data storage device having reference se- 
quences stored thereon. For example, the sequence 
comparer may comprise a computer program which in- 
dicates polymorphisms. 

[0101] lnanotheraspectofthisembodiment,thecom- 
puter system further comprises an identifier which iden- 
tifies features in said sequence. 
[0102] Another embodiment of the present invention 
is a method for comparing a first sequence to a refer- 
ence sequence wherein said first sequence is selected 
from the group consisting of a nucleic acid code of SE- 
QID NOs. 24-3883 and 7744-19335 and a polypeptide 
code of SEQ ID NOs. 3884-7743 comprising the steps 
of reading said first sequence and said reference se- 
quence through use of a computer program which com- 
pares sequences and determining differences between 
said first sequence and said reference sequence with 
said computer program. In some aspects of this embod- 
iment, said step of determining differences between the 
first sequence and the reference sequence comprises 
identifying polymorphisms. 

[01 03] Another embodiment of the present invention 
is a method for identifying a feature in a sequence se- 
lected from the group consisting of a nucleic acid code 
of SEQ ID NOs. 24-3883 and 7744-19335 and a 
polypeptide code of SEQ ID NOs. 3884-7743 compris- 



ing the steps of reading said sequence through the use 
of a computer program which identifies features in se- 
quences and identifying features in said sequence with 
said computer program. 

5 [0104J Another embodiment of the present invention 
is a vector comprising a nucleic acid according to any 
one of the nucleic acids described above. 
[0105] Another embodiment of the present invention 
is a host cell containing the above vector. 

io [01 06] Another embodiment of the present invention 
is a method of making any of the nucleic acids described 
above comprising the steps of introducing said nucleic 
acid into a host cell such that said nucleic acid is present 
in multiple copies in each host cell and isolating said 

'5 nucleic acid from said host cell. 

[0107] Another embodiment of the present invention 
is a method of making a nucleic acid of any of the nucleic 
acids described above comprising the step of sequen- 
tially linking together the nucleotides in said nucleic ac- 
20 ids. 

[01 08] Another embodiment of the present invention 
is a method of making any of the polypeptides described 
above wherein said polypeptides is 150 amino acids in 
length or less comprising the step of sequentially linking 

25 together the amino acids in said polypeptide. 

[01 09] Another embodiment of the present invention 
is a method of making any of the polypeptides described 
above wherein said polypeptides is 120 amino acids in 
length or less comprising the step of sequentially linking 

30 together the amino acids in said polypeptides. 

Brief Description of the Sequence Listing 



P>110] SEQ ID NOs. 1, 3, 5, 7, 9, 11, and 13 are full- 
35 length cDNAs prepared using the methods described 
herein. 

[01 1 1] SEQ ID NOs. 2, 4 and 6 are the signal peptides 
encoded by the nucleic acids of SEQ ID NOs. 1 , 3 and 
5 respectively. 

« [0112] SEQIDNOs.8,10,12,and14arethepolypep- 
tides encoded by the nucleic acids of SEQ ID NOs. 7, 
9, 1 1 , and 1 3 respectively. 

[0113] SEQ ID NOs. 15, 16, 18, 19, 21 and 22 are 
primers whose use is described in the specification 
[0114] SEQ IDNOs. 17,20, and23are Resequences 
of nucleic acids containing transcription factor binding 
sites which were obtained as described below. 
[01 15] SEQ ID NOs. 24-3 83 are nucleic acids having 
an incomplete ORF which encodes a signal peptide. As 
so used herein, an "incomplete ORF is an open reading 
frame in which a start codon has been identified but no 
stop codon has been identified. The locations of the in- 
complete ORFs and sequences encoding signal pep- 
tides are listed in the accompanying Sequence Listing. 
55 in addition, the von Heijne score of the signal peptide 
computed as described below is listed as the "score" in 
the accompanying Sequence Listing. The sequence of 
the signal-peptide is listed as M seq M in the accompanying 



14 



27 



EP1 104 808 A1 



28 



Sequence Listing. The T in the signal peptide sequence 
indicates the location where proteolytic cleavage of the 
signal peptide occurs to generate a mature protein. 
[01 16] SEQ ID NOs. 384-1 338 arc nucleic acids hav- 
ing an incomplete ORF in which no sequence encoding 
a signal peptide has been identified to date. However, it 
remains possible that subsequent analysis will identify 
a sequence encoding a signal peptide in these nucleic 
acids. The locations of the incomplete ORFs are listed 
in the accompanying Sequence Listing. 
[0117] SEQ ID NOs. 1339-2059 are nucleic acids 
having a complete ORF which encodes a signal peptide. 
As used herein, a "complete ORF" is an open reading 
frame in which a start codon and a stop codon have 
been identified. The locations of the complete ORFs and 
sequences encoding signal peptides are listed in the ac- 
companying Sequence Listing. In addition, the von He- 
ijne score of the signal peptide computed as descnbed 
below is listed as the "score" in the accompanying Se- 
quence Listing. The sequence of the signal-peptide is 
listed as "seq" in the accompanying Sequence Listing. 
The T in the signal peptide sequence indicates the lo- 
cation where proteolytic cleavage of the signal peptide 
occurs to generate a mature protein. 
[0118] SEQ ID NOs. 2060-3883 are nucleic acids 
having a complete ORF in which no sequence encoding 
a signal peptide has been identified to date. However, it 
remains possible that subsequent analysis will identify 
a sequence encoding a signal peptide in these nucleic 
acids. The locations of the complete ORFs are listed in 
the accompanying Sequence Listing. 
[0119] SEQ ID NOs. 3884-4243 are "incomplete 
polypeptide sequences" which include a signal peptide. 
Incomplete polypeptide sequences" are polypeptide se- 
quences encoded by nucleic acids in which a start co- 
don has been identified but no stop codon has been 
identified. These polypeptides are encoded by the nu- 
cleic acids of SEQ ID NOs. 24-383. The location of the 
signal peptide is listed in the accompanying Sequence 
Listing. 

[0120] SEQ ID NOs. 4244-5198 are incomplete 
polypeptide sequences in which no signal peptide has 
been identified to date. However, it remains possible 
that subsequent analysis will identify a signal peptide in 
these polypeptides. These polypeptides are encoded by 
the nucleic acids of SEQ ID NOs. 384-1338. 
[0121] SEQ ID NOs. 5199-5919 are "complete 
polypeptide sequences" which include a signal peptide. 
"Complete polypeptide sequences" are polypeptide se- 
quences encoded by nucleic acids in which a start co- 
don and a stop codon have been identified. These 
polypeptides are encoded by the nucleic acids of SEQ 
ID NOs. 1339-2059. The location of the signal peptide 
is listed in the accompanying Sequence Listing. 
[0122] SEQ ID NOs. 5920-7743 are complete 
polypeptide sequences in which no signal peptide has 
been identified to date. However, it remains possible 
that subsequent analysis will identify a signal peptide in 



these polypeptides. These polypeptides are encoded by 
the nucleic acids of SEQ ID NOs.2060-3883. 
[0123] SEQ ID NOs. 7744-19335 are nucleic acid se- 
quences in which no open reading frame of at least 150 
5 nucleotides has been conclusively identified to date. 
However, it remains possible subsequent analysis will 
identify an open reading frame in these nucleic acids. 
[0124] In the accompanying Sequence Listing, all in- 
stances of the symbol "n" in the nucleic acid sequences 
10 mean that the nucleotide can be adenine, guanine, cy- 
tosine orthymine. In some instances the polypeptide se- 
quences in the Sequence Listing contain the symbol 
"Xaa." These "Xaa" symbols indicate either (1 ) a residue 
which cannot be identified because of nucleotide se- 
tt quence ambiguity or (2) a stop codon in the determined 
sequence where applicants believe one should not exist 
(if the sequence were determined more accurately). In 
some instances, several possible identities of the un- 
known amino acids may be suggested by the genetic 
20 code. 

[0125] In the case of secreted proteins, it should be 
noted that, in accordance with the regulations governing 
Sequence Listings, in the appended Sequence Listing, 
the encoded protein (i.e. the protein containing the sig- 

25 nat peptide and the mature protein or part thereof) ex- 
tends from an amino acid residue having a negative 
number through a positively numbered amino acid res- 
idue. Thus, the first amino acid of the mature protein 
resulting from cleavage of the signal peptide is desig- 

30 nated as amino acid number 1 , and the first amino acid 
of the signal peptide is designated with the appropriate 
negative number. 

Brief Description of the Drawings 



[0126] Figure 1 summarizes the computer analysis 
procedure for obtaining consensus contigated ESTs. 
[01 27] Figure 2 is an analysis of the 43 amino terminal 
amino acids of all human SwissProt proteins to deter- 
40 mine the frequency of false positives and false nega- 
tives using the techniques for signal peptide identifica- 
tion described herein. 

[0128] Figure 3 summarizes a general RT-PCR- 

based-method used to clone and sequence extended 
45 cDNAs containing sequences adjacent to 5'ESTs. 

[0129] Figure 4 provides a schematic description of 

the promoters isolated and the way they are assembled 

with the corresponding 5'ESTs. 

[0130] Figure 5 describes the transcription factor 
so binding sites present in each of the promoters of Figure 

4. 

[0131] Figure 6 is a block diagram of an exemplary 
computer system. 

[0132] Figure 7 is a flow diagram illustrating one em- 
55 bodiment of a process 200 for comparing a new nucle- 
otide or protein sequence with a database of sequences 
in order to determine the homology levels between the 
new sequence and the sequences in the database. 
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[0133] Figure 8 is a flow diagram Illustrating one em- 
bodiment of a process 250 in a computer for determining 
whether two sequences are homologous. 
[0134J Figure 9 is a flow diagram illustrating one em- 
bodiment of an identifier process 300 for detecting the 
presence of a feature in a sequence. 
[0135] Figure 1 0 is a table describing algorithms, pa- 
rameters and criteria that can be used for each step of 
extended cDNA analysis. 

Detailed Description of the Preferred Embodiment 

I. General Methods far Obtaining 5' ESTs derived 
from mRNAs with Intact 5* ends 

[0136] The 5'ESTs of the present invention were ob- 
tained from cDNA libraries derived from mRNAs having 
intact 5* ends as described in Examples 1 to 5 using ei- 
ther a chemical or enzymatic approach. 

EXAMPLE 1 

Preparation of mRNA 

[0137] Total human RNAs or polyA + RNAs derived 
from different tissues were respectively purchased from 
LABIMO and CLOISTTECH and used to generate cDNA 
libraries as described below. The purchased RNA had 
been isolated from cells or tissues using acid guanidium 
thiocyanate-phenol-chloroform extraction (Chomezyni- 
ski and Sacchi, Analytical Biochemistry 162:156-159, 
1987). PolyA+ RNA was isolated from total RNA (LABI- 
MO) by two passes of oligo cTT chromatography, as de- 
scribed by Aviv and Ledcr. Proc. Natl. Acad. Sci. USA 
69:1408-1412, 1972) in order to eliminate ribosoma! 
RNA. 

[0138] The quality and the integrity of the polyA+ 
RNAs were checked. Northern blots hybridized with a 
probe corresponding to an ubiquitous mRNA, such as 
elongation factor 1 or elongation factor 2, were used to 
confirm that the mRNAs were not degraded. Contami- 
nation of the polyA+ mRNAs by ribosomal sequences 
was checked using Northern blots and a probe derived 
from the sequence of the 28S rRNA. Preparations of 
mRNAs with less than 5% of rRNAs were used in library 
construction. To avoid constructing libraries with RNAs 
contaminated by exogenous sequences (prokaryotic or 
fungal), the presence of bacterial 16S ribosomal se- 
quences and of two highly expressed fungal mRNAs 
was examined using PCR. 

EXAMPLE 2 

Methods for Obtaining mRNAs having intact 5' Ends 

[01 39] Following preparation of the mRNAs from var- 
ious tissues as described above, selection of mRNA 
with intact 5' ends and specific attachment of an oligo- 



nucleotide tag to the 5* end of said mRNA is performed 
using either a chemical or enzymatic approach. Both 
techniques take advantage of the presence of the "cap" 
structure, which characterizes the S^end of intact mR- 
5 NAs and which comprises a guanosine generally meth- 
ylated once, at the 7 position. 
[0140] The chemical modification approach involves 
the optional elimination of the 2\ 3'-cis diol of the 3" ter- 
minal ribose, the oxidation of the 2', 3', -cis diol of the 
10 ribose linked to the cap of the 5' ends of the mRNAs into 
a dialdehyde, and the coupling of the said obtained di- 
aldehyde to a derivatized oligonucleotide tag. Further 
detail regarding the chemical approaches for obtaining 
mRNAs having intact 5' ends are disclosed in Interna- 
ls tional Application No. W096/34981 , published Novem- 
ber 7, 1996, the disclosure of which is incorporated 
herein by reference in its entirety. 
[0141] The enzymatic approach for ligating the oligo- 
nucleotide tag to the 5' ends of mRNAs with intact 5' 
20 ends involves the removal of the phosphate groups 
present on the 5' ends of u ncapped incomplete mRNAs, 
the subsequent decapping of mRNAs with intact 5' ends 
and the ligation of the phosphate present at the 5' end 
of the decapped mRNA to an oligonucleotide tag. Fur- 
25 ther detail regarding the enzymatic approaches for ob- 
taining mRNAs having intact 5' ends are disclosed in 
Dumas Milne Edwards J.B. (Doctoral Thesis of Paris VI 
University, Le clonage des ADNc complets: difficuftes et 
perspectives nouvelles. Apports pour Petude de la reg- 
30 ulation de I'expression de (a tryptophane hydroxylase 
de rat, 20 Dec. 1 993), EP0 625572 and Kato et ai. f Gene 
150243-250 (1994), the disclosures of which are incor- 
porated herein by reference in their entireties. 
[0142] In either the chemical or the enzymatic ap- 
35 proach, the oligonucleotide tag has a restriction enzyme 
site (e.g. Eco Rl sites) therein to facilitate later cloning 
procedures. Following attachment of the oligonucle- 
otide tag to the mRNA, the integrity of the mRNA was 
then examined by performing a Northern blot using a 
40 probe complementary to the oligonucleotide tag. 

EXAMPLE 3 

cDNA Synthesis Using mRNA Templates Having Intact 
45 5' Ends 

[0143] For the mRNAs joined to oligonucleotide tags 
using either the chemical or the enzymatic method, first 
strand cDNA synthesis was performed using a ther- 

so mostable reverse transcriptase with an oligo-dT primer. 
In some instances, this oligo-dT primer contained an in- 
ternal tag of at least 4 nucleotides which is differentfrom 
one tissue to the other. In order to protect internal EcoRI 
sites in the cDNA from digestion at later steps in the pro- 

55 cedure, methylated dCTP was used for first strand syn- 
thesis. After removal of RNA by an alkaline hydrolysis, 
the first strand of cDNA was precipitated using isopro- 
panol in order to eliminate residual primers. 
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[0144] The second strand of the cDNA was then syn- 
thesized with a Klenow fragment using a primer corre- 
sponding to the 5'end of the ligated oligonucleotide. 
Preferably, the primer is 20-25 bases in length. Methyl- 
ated dCTP was also used for second strand synthesis 
in order to protect internal EcoRI sites in the cDNA from 
digestion during the cloning process. 

EXAMPLE 4 

Cloning of cDNAS derived from mRNA with intact 5' 
ends 

[01 45] Following second strand synthesis, the cDN As 
were cloned into the phagemid pBlueScript II SK vector 
(Stratagene) or one of its derivative. The ends of the 
cDNAs were blunted with T4 DNA polymerase (Biolabs) 
and the cDNA was digested with EcoRI. Since methyl- 
ated dCTP was used during cDNA synthesis, the EcoRI 
site present in the tag was the only hemi-methylated 
site, hence the only site susceptible to EcoRI digestion. 
In some instances, to facilitate subcloning, an Hind III 
adaptor was added to the 3' end of cDNAs. 
[0146] The cDNAs were then size fractionated using 
either exclusion chromatography (AcA, Biosepra) or 
electrophoretic separation which yields 3 or 6 different 
fractions. The cDNAs were then directionally cloned ei- 
ther into pBlueScript using either the EcoRI and Smal 
restriction sites or the EcoRI and Hind III restriction sites 
when the Hind III adaptator was present in the cDNAs. 
The ligation mixture was electroporated into bacteria 
and propagated under appropriate antibiotic selection. 

EXAMPLE 5 

Selection of Clones Having the Oligonudeotide Tag 
Attached Thereto 

[0147] Clones containing the oligonucleotide tag at- 
tached to cDNAs were then selected as follows. 
[0148] The plasmid DNAs containing cDNA libraries 
made as described above were purified (Qiagen). A 
positive selection of the tagged clones was performed 
as follows. Briefly, in this selection procedure, the plas- 
mid DNA was converted to single stranded DNA using 
gene II endonuclease of the phage F1 in combination 
with an exonuclease (Chang et al„ Gene 127:95-8, 
1993) such as exonuclease III or T7 gene 6 exonucle- 
ase. The resulting single stranded DNA was then puri- 
fied using paramagnetic beads as described by Fry et 
a/., Biotechniques, 13:1 24-1 31 , 1 992. In this procedure, 
the single stranded DNA was hybridized with a bioti- 
nylated oligonucleotide having a sequence correspond- 
ing to the 3' end of the oligonucleotide tag described in 
Example 2. Preferably, the primer has a length of 20-25 
bases. Clones including a sequence complementary to 
the biotinylated oligonucleotide where captured by incu- 
bation with streptavidin coated magnetic beads followed 



by magnetic selection. After capture of the positive 
clones, the plasmid DNA was released from the mag- 
netic beads and converted Into double stranded DNA 
using a DNA polymerase such as the ThermoSeque- 

s nase obtained from Amersham Pharmacia Biotech. Al- 
ternatively, protocols such as the Gene Trapper kit (Gib- 
co BRL) may be used. The double stranded DNA was 
then electroporated into bacteria. The percentage of 
positive clones having the 5* tag oligonucleotide was es- 

10 timated to typically rank between 90 and 98%, using dot 
blot analysis. 

[0149] Following electroporation, the libraries were 
ordered in 384-microtiter plates (MTP). A copy of the 
MTP was stored for future needs. Then the libraries 
is were transferred into 96 MTP and sequenced. 

EXAMPLE 6 

Sequencing of Inserts in Selected Clones 

[01 50] Plasmid inserts were first amplified by PCR on 
PE-9600/ PE-9700 thermocyclers (PE Biosystems, Ap- 
plied Biosysterns Division, Foster City, CA) ortetrades 
thermocyclers (MJ Research), using L7AF3 and SETA 
primers (Genset SA), ExTaq polymerase (Takara), 
dNTPs (Boehringer), buffer and cycling conditions as 
recommended by the PE Biosystems Corporation. PCR 
products were then sequenced using MegaBace Capil- 
lary sequencers (Molecular Dynamics). Sequencing re- 
actions were performed using PE 9600 / PE-9700 ther- 
mocyclers with ET primer (Energy Transfer) chemistry 
and ThermoSequenase (Amersham Pharmacia Bio- 
tech). The primer used was Reverse Primer (RP) (Am- 
ersham Pharmacia Biotech) as appropriate. The dNTPs 
and ddNTPs used in the sequencing reactions were pur- 
chased from Boehringer. Sequencing buffer, reagent 
concentrations and cycling conditions were as recom- 
mended by Amersham. Following the sequencing reac- 
tion, the samples were purified with Sephadex (G50) 
and injected in the capillaries of the MegaBace. Injection 
was performed for 12 seconds at 10000 V and electro- 
phoresis for 1 00 minutes at 1 0000V. The sequence data 
were collected and analyzed using the Instrument Con- 
trol Manager analysis software of the MegaBace prior 
to the Gensefs proprietary sequence verification soft- 
ware. 

[0151] Alternatively, plasmid inserts were first ampli- 
fied by PCR on PE-9600 thermocyclers (PE Biosys- 
tems, Applied Biosystems Division, Foster City, CA), us- 
ing standard SETA- A and SETA-B primers (Genset SA). 
AmpliTaqGold (PE Biosystems), dNTPs (Boehringer), 
buffer and cycling conditions as recommended by the 
PE Biosystems Corporation. PCR products were then 
sequenced using automatic ABI Prism 377 sequencers 
(PE Biosystems). Sequencing reactions were per- 
formed using PE 9600 thermocyclers with standard dye- 
primer chemistry and ThermoSequenase (Amersham 
Pharmacia Biotech). The primers used were either T7 
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or 21M13 (available from Genset SA) as appropriate. 
The primers were labeled with the JOE, FAM, ROX and 
TAMRA dyes. The dNTPs and ddNTPs used in the se- 
quencing reactions were purchased from Boehringer. 
Sequencing buffer, reagent concentrations and cycling 
conditions were as recommended by Amersham. Fol- 
lowing the sequencing reaction, the samples were pre- 
cipitated with ethanol, resuspended in formamide load- 
ing buffer, and loaded on a standard 4% acrylamide gel. 
Electrophoresis was performed for 2.5 hours at 3000V 
on an ABI 377 sequencer, and the sequence data were 
collected and analyzed using the ABI Prism DNA Se- 
quencing Analysis Software, version 2.1.2. 

II. Computer Analysis of the Isolated 5' ESTs 

[0152] The sequence data from the different cDNA li- 
braries made as described above were transferred to a 
database, where quality control and validation steps 
were performed. A base-caller, working using a Unix 
system, automatically flagged suspect peaks, taking in- 
to account the shape of the peaks, the inter-peak reso- 
lution, and the noise level. The proprietary base-caller 
also performed an automatic trimming. Any stretch of 
25 or fewer bases having more than 4 suspect peaks 
was considered unreliable and was discarded. Se- 
quences corresponding to cloning vector or ligation oli- 
gonucleotides were automatically removed from the 
5'EST sequences. However, the resulting 5'EST se- 
quences may contain 1 to 5 bases belonging to the 
above mentioned sequences at their 5' end. If needed, 
these can easily be removed on a case to case basis. 
[01 53] Following sequencing as described above, the 
sequences of the 5' ESTs were entered in a database 
for storage and manipulation as described befow and as 
depicted in Figure 1 . Before searching the 5'ESTs in the 
database for sequences of interest, 5'ESTs derived from 
mRNAs which were not of interest were identified. Brief- 
ly, such undesired sequences may be of three types. 
First, contaminants of either endogenous (ribosomal 
RNAs, transfert RNAs, mitochondrial RNAs) or exoge- 
nous (prokaryotic RNAs and fungal RNAs) origins were 
identified. Second, uninformative sequences, namely 
redundant sequences, small sequences and highly de- 
generate sequences were identified. Third, repeated se- 
quences (Alu, LI, THE and MER repeats, SSTR se- 
quences or satellite, micro-satellite, or telomere re- 
peats) were identified and masked in further processing. 
[0154] Then, in order to determine the accuracy of the 
sequencing procedure as well as the efficiency of the 5' 
selection described above, the analyses described in 
Examples 7 and 8 respectively were performed on 
5'ESTs. 



EXAMPLE 7 

Measurement of Sequencing Accuracy by Comparison 
to Known Sequences 

5 

[0155] To further determine the accuracy of the se- 
quencing procedure descnbed in Example 6, the se- 
quences of 5' ESTs derived from known sequences 
were identified and compared to the original known se- 
10 quences. First, a FASTA analysis with overhangs short- 
erthan 5 bp on both ends was conducted on the 5* ESTs 
to identify those matching an entry in the public human 
mRNA database available at the time of filing the 
present documents. The 5' ESTs which matched a 

is known human mRNA where then realigned with their 
cognate mRNA and dynamic programming was used to 
include substitutions, insertions, and deletions in the list 
of "errors" which would be recognized. Errors occurring 
in the last 10 bases of the 5' EST sequences were ig- 

20 nored to avoid the inclusion of spurious cloning sites in 
the analysis of sequencing accuracy. This analysis re- 
vealed that the sequences incorporated in the database 
had an accuracy of more than 99.3% using Megabaces 
Capillary sequencers and more than 99.5% using ABI 

25 377 sequencers. 

EXAMPLE 8 



30 



Determination of Efficiency of 5' EST Selection 



[01 56] To determine the efficiency at which the above 
selection procedures isolated cDNAs which include the 
5" ends of their corresponding mRNAs, the sequences 
of 5'ESTs were aligned with a reference pool of complete 
35 mRN A/cDNA extracted from the EMBL release 57 using 
the FASTA algorithm. The reference mRNA/cDNA start- 
ing at the most 5' transcnption start site was obtained, 
and then compared to the 5' transcription start site po- 
sition of the 5'EST More than 75% of 5'ESTs had their 
40 5' ends close to the 5' ends of the known sequence. As 
some of the mRNA sequences available in the EMBL 
database are deduced from genomic sequences, a 5' 
end matching with these sequences will be counted as 
an internal match. Thus, the method used here under- 
45 estimates the yield of 5'ESTs including the authentic 5' 
ends of their corresponding mRNAs. 

EXAMPLE 9 

50 Generation of Consensus Contigated 5' ESTs 

[0157] Since the cDNA libraries made above include 
multiple 5' ESTs derived from the same mRNA. overlap- 
ping 5'ESTs may be assembled into continuous se- 
55 quences. The following method describes how to effi- 
ciently align multiple 5'ESTs in order to yield not only 
consensus contigated 5'EST sequences for mRNAs de- 
rived from different genes but also consensus contigat- 
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ed 5'EST sequences for different mRN As, so called var- 
iants, transcribed from the same gene such as alterna- 
tively spliced mRNAs. 

[01 58] A subset of 5'ESTs free from endogenous con- 
taminants and uninformative sequences, and following 
the masking of repeats, was first selected. 
[0159] This whole set of sequences was first parti- 
tioned into small clusters containing sequences which 
exhibited perfect matches with each other on a given 
length and which derived from a small number of differ- 
ent genes. Some 5'EST sequences, so called single- 
tons, were not aligned using this approach because they 
were not homologous to any other sequence. 
[0160] Thereafter, all variants of a given gene were 
identified in each cluster using a proprietary software. 
5'EST sequences belonging to the same variant were 
then contigated and consensus contigated 5'EST se- 
quences generated for each variant. All consensus con- 
tigated 5'EST sequences were subsequently compared 
to the whole set of individual 5'EST sequences used to 
obtained them. 

[01 61 ] If desired, the consensus contigated 5'EST se- 
quences may be verified by identifying clones in nucleic 
acid samples derived from biological tissues, such as 
cDNA libraries, which hybridize to the probes based on 
the sequences of the consensus contigated 5'ESTs us- 
ing any methods described herein and sequencing 
those clones. 

[0162] To assess the yield of new sequences, the 
5'ESTs obtained and consensus contigated 5'ESTs 
were compared to all known complete human mRNAs 
extracted from the EMBL release 58 using BLASTN with 
the following parameters S=1000, S2=1000, V=5 and 
B=5. All sequences with high scoring pairs whose sig- 
nificance was above e-100 were kept. Then, the ob- 
tained 5'ESTs and consensus contigated S'ESTs were 
compared to all the human proteins extracted from 
SwissProt release 37, TrEMBL release 58 and Genseqp 
(Derwenfs database of patented amino acid sequenc- 
es) release 35.3 on both strands using blastx with the 
following parameters: S=450, S2=450, V=5 and B=5. All 
sequences with high scoring pairs whose significance 
was above e-50 were kept. Using this process, about 
86% of 5'ESTs or consensus assembled 5'ESTs were 
considered unidentified. 

EXAMPLE 10 

Identification of the Most Probable Open Reading 
Frame of 5' ESTs 

[0163] Subsequently, consensus contigated 5'ESTs 
and 5'ESTs were screened to identify those having an 
open reading frame (ORF). 

[01 64] The nucleic acid sequence was first divided in- 
to several subsequences which coding propensity was 
evaluated separately using one or several different 
methods known to those skilled in the art such as the 



evaluation of N-mer frequency and Its variants (Fickett 
and Tung, Nucleic Acids ftes;20;6441 -50 (1992)) or the 
Average Mutual Information method (Grosse et a/. In- 
ternational Conference on Intelligent Systems for Mo- 

5 lecular Biology, Montreal, Canada. June 28-July 1, 
1998). Each of the scores obtained by the techniques 
described above were then normalized by their distribu- 
tion extremities and then fused using a neural network 
into a unique score that represents the coding probabil- 

10 ity of a given subsequence. The coding probability 
scores obtained for each subsequence, thus the prob- 
ability score profiles obtained for each reading frame, 
was then linked to the initiation codons present on the 
sequence. For each open reading frame, defined as a 

15 nucleic acid sequence beginning with an ATG codon , an 
ORF score was determined. Preferably, this score is the 
sum of the probability scores computed for each subse- 
quence corresponding to the considered ORF in the cor- 
rect reading frame corrected by a function that negative- 

20 iy accounts for locally high score values and positively 
accounts for sustained high score values. The most 
probable ORF with the highest score was selected. 
[0165] Alternatively, open reading frames were simply 
defined as uninterrupted nucleic acid sequences longer 

25 than 1 50 nucleotides and beginning with an ATG codon. 
[0166] In some embodiments, nucleic acid sequenc- 
es encoding an "incomplete ORP, as described herein, 
namely an open reading frame in which a start codon 
has been identified but no stop codon has been identi- 

30 fied, were obtained. 

[01 67] In other embodiments, nucleic acid sequences 
encoding a "complete ORP, as used herein, namely an 
open reading frame in which a start codon and a stop 
codon have been identified, are obtained. 

35 [0168] In a preferred embodiment, open reading 
frames encoding polypeptides of at least 50 amino acids 
were obtained. 

[0169] To confirm that the chosen ORF actually en- 
codes a polypeptide, the consensus contigated 5'EST 

40 or 5'EST may be used to obtain an extended cDNA us- 
ing any of the techniques described herein, and espe- 
cially those described in Examples 17 and 18. Then, 
such obtained extended cDNAs may be screened for 
the most probable open reading frame using any of the 

45 techniques described herein. The amino add sequence 
of the ORF encoded by the consensus contigated 5'EST 
or 5'EST may then be compared to the amino acid se- 
quence of the ORF encoded by the extended cDNA us- 
ing any of the algorithms and parameters described 

so herein in order to determine whether the ORF encoded 
by the extended cDNA is basically the same as the one 
encoded by the consensus contigated 5'EST or 5'EST. 
[0170] Alternatively, to confirm that the chosen ORF 
actually encodes a polypeptide, the consensus contigat- 

55 ed 5'EST or 5'EST may be used to obtain an extended 
cDNA using any of the techniques described herein, and 
especially those descnbed in Examples 17 and 18. Such 
an extended cDNA may then be inserted into an appro- 
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priate expression vector and used to express the 
polypeptide encoded by the extended cONA as de- 
scribed herein. The expressed polypeptide may be iso- 
lated, purified, or enriched as described herein. Several 
methods known to those skilled In the art may then be 
used, including in combination, to determine whether 
the expressed polypeptide is the one actually encoded 
by the chosen ORF, herein referred to as the expected 
polypeptide. Such methods are based on the determi- 
nation of predictable features of the expressed polypep- 
tide, Including but not limited to its amino acid sequence, 
its size or its charge, and the comparison of these fea- 
tures to those predicted for the expected polypeptide. 
The following paragraphs present examples of such 
methods. 

[0171] One of these methods involves the determina- 
tion of at least a portion of the amino acid sequence of 
the expressed polypeptide using any technique known 
to those skilled in the art. For example, the amino-ter- 
minal residues may be determined using automated 
techniques based on Edman's degradation of polypep- 
tides in which N-terminal residues are sequentially la- 
beled and cleaved from the polypeptide of interest (see 
Stryer, Exploring proteins in Biochemistry. Freeman and 
Company, New York, (1 995)). The amino add sequence 
of the expressed polypeptide may then be compared to 
the one predicted for the expected polypeptide using 
any algorithm and parameters described therein. 
[0172] Alternatively, the size of the expressed 
polypeptides may be determined using techniques fa- 
miliar to those skilled in the art such as Coomassie blue 
or silver staining and subsequently compared to the size 
predicted for the expected polypeptide. Generally, the 
band corresponding to the expressed polypeptide will 
have a mobility near that expected based on the number 
of amino acids in the open reading frame of the extend- 
ed cDNA. However, the band may have a mobility dif- 
ferent than that expected as a result of modifications 
such as glycosylation, ubiquitination, or enzymatic 
cleavage. 

[0173] Alternatively, specific antibodies or antipep- 
tides may be generated against the expected polypep- 
tide as described in Example 33 and used to perform 
immunoblotting or immunoprecipitation studies against 
the expressed polypeptide. The presence of a band in 
samples from cells containing the expression vector 
with the extended cONA which is absent in samples 
from cells containing the expression vector encoding an 
irrelevant polypeptide indicates that the expected 
polypeptide or portion thereof is being expressed. Gen- i 
erally, the band corresponding to the expressed 
polypeptide will have a mobility near that expected 
based on the number of amino acids in the open reading 
frame of the extended cDNA. However, the band may 
have a mobilize different than that expected as a result t 
of modifications such as glycosylation, ubiquitination, or 
enzymatic cleavage. 



EXAMPLE 11 

Identification of Potential Signal Sequences in 5' ESTs 

5 [0174] The 5'ESTs or consensus contigated 5'ESTs 
found to include an ORF were then searched to identify 
potential signal motifs using slight modifications of the 
procedures disclosed in Von Heijne, Nudeh Acids Res. 
74:4683-4690, 1986. Those sequences encoding a 

10 peptide with a score of at least 3.5 in the Von Heijne 
signal peptide identification matrix were considered to 
possess a signal sequence. 

EXAMPLE 12 

15 

Confirmation of Accuracy of Identification of Potential 
Signal Sequences in 5* ESTs 

[0175] The accuracy of the above procedure for iden- 
a> tifying signal sequences encoding signal peptides was 
evaluated by applying the method to the 43 amino acids 
located at the N terminus of all human SwissProt pro- 
teins. The computed Von Heijne score for each protein 
was compared with the known characterization of the 
& protein as being a secreted protein or a non- secreted 
protein. In this manner, the number of non-secreted pro- 
teins having a score higher than 3.5 (false positives) and 
the number of secreted proteins having a score lower 
than 3.5 (false negatives) could be calculated, 
w [0176] Using the results of the above analysis, the 
probability that a peptide encoded by the 5' region of the 
mRNA is in fact a genuine signal peptide based on its 
Von Heijne's score was calculated based on either the 
assumption that 1 0% of human proteins are secreted or 
» the assumption that 20% of human proteins are secret- 
ed. The results of this analysis are shown in Figure 2. 
[0177] Using the above method of identification of se- 
cretory proteins, 5' ESTs of the following polypeptides 
known to be secreted were obtained: human glucagon, 
o gamma interferon induced monokine precursor, secret- 
ed cyclophilin-like protein, human pleiotropin, and hu- 
man biotinidase precursor. Thus, the above method 
successfully identified those 5' ESTs which encode a 
signal peptide. 
5 [01 78] To conf irm that the signal peptide encoded by 
the 5* ESTs or consensus contigated 5* ESTs actually 
functions as a signal peptide, the signal sequences from 
the 5' ESTs or consensus contigated 5' ESTs may be 
cloned into a vector designed for the identification of sig- 
> nal peptides. Such vectors are designed to confer the 
ability to grow in selective medium only to host cells con- 
taining a vector with an operabry linked signal sequence. 
For example, to confirm that a 5' EST or consensus con- 
tigated 5'EST encodes a genuine signal peptide, the sig- 
; nal sequence of the 5* EST or consensus contigated 5* 
EST may be inserted upstream and in frame with a non- 
secreted form of the yeast invertase gene in signal pep- 
tide selection vectors such as those described in U.S. 
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Patent No. 5,536,637. Growth of host cells containing 
signal sequence selection vectors with the correctly in- 
serted 5* EST or consensus contigated 5' EST signal 
sequence confirms that the 5* EST or consensus conti- 
gated 5' ESTs encodes a genuine signal peptide. 
[0179] Alternatively, the presence of a signal peptide 
may be confirmed by cloning the extended cDNAs ob- 
tained using the ESTs or consensus contigated 5' ESTs 
into expression vectors such as pXT1 as described be- 
low, or by constructing promoter-signal sequence-re- 
porter gene vectors which encode fusion proteins be- 
tween the signal peptide and an assayable reporter pro- 
tein. After introduction of these vectors into a suitable 
host cell, such as COS cells or N1H 3T3 cells, the growth 
medium may be harvested and analyzed for the pres- 
ence of the secreted protein. The medium from these 
cells is compared to the medium from control cells con- 
taining vectors lacking the signal sequence or extended 
cDNA insert to identify vectors which encode a function- 
al signal peptide or an authentic secreted protein 

EXAMPLE 13 

Analysis of the Sequences of the Invention 

[0180] The set of the nucleic acid sequences of the 
invention (SEQ ID NOs. 24-3883 and 7744-19335) was 
obtained as described in Example 9. Subsequently, the 
most probable open reading frame was determined and 
signal sequences were searched, as described in Ex- 
amples 1 0 and 1 1 , for all sequences of the invention. 
[0181] The nucleotide sequences of the SEQ ID NOs. 
24-3883 and 7744-19335 and the preferred polypep- 
tides sequences encoded by SEQ ID NOs. 24-3883 (/. 
e. polypeptide sequences of SEQ ID NOs. 3884-7743) 
are provided in the appended sequence listing. In addi- 
tion, for each of the nucleic acid sequences of the in- 
vention as referred to by its sequence identification 
number in the first column, Table I provides the positions 
of the first and last codons for each of the corresponding 
open reading frames in the second column. 
[01 82] For each of the consensus contigated 5'ESTs 
of the invention as referred to by its sequence identifi- 
cation number in the first column of Table II, the second 
column gives a list of the positions of the biological 
5'ESTs which were used to obtain this consensus con- 
tigated 5'EST. For example, if the first column indicates 
250 and the second column indicates 1-120; 6-230; 
200-350, this means that the consensus contigated 
5'ESTs of SEQ ID NO:250 was computed from 3 differ- 
ent 5'ESTs, the first one matching from positions 1 to 
1 20 of the consensus contigated 5'EST, the second one 
from positions 6 to 230 of the consensus contigated 
5'EST, and the third one from positions 200 to 350 ofthe 
consensus contigated 5'EST. 

[01 83] If one of the nucleic acid sequences of SEQ ID 
NOs. 24-3883 and 7744-1 9335 is suspected of contain- 
ing one or more incorrect or ambiguous nucleotides, the 



ambiguities can readily be resolved by resequencing a 
fragment containing the nucleotides to be evaluated. If 
one or more incorrect or ambiguous nucleotides arc de- 
tected, the corrected sequences should be included in 

5 the dusters from which the sequences were isolated, 
and used to compute other consensus contigated se- 
quences on which other ORFs would be identified. Nu- 
cleic acid fragments for resolving sequencing errors or 
ambiguities may be obtained from deposited clones or 

10 can be isolated using the techniques described herein. 
Resolution of any such ambiguities or errors may be fa- 
cilitated by using primers which hybridize to sequences 
located close to the ambiguous or erroneous sequenc- 
es. For example, the primers may hybridize to sequenc- 
es es within 50-75 bases of the ambiguity or error. Upon 
resolution of an error or ambiguity, the corresponding 
corrections can be made in the protein sequences en- 
coded by the DNA containing the error or ambiguity. The 
amino acid sequence of the protein encoded by a par- 

20 ticular clone can also be determined by expression of 
the done in a suitable host cell, collecting the protein, 
and determining its sequence. 
{0184] In addition, if one of the sequences of SEQ ID 
NOs. 3884-7743 is suspected of containing a truncated 

25 ORF as the result of a frameshift in the sequence, such 
frameshifting errors may be corrected by combining the 
following two approaches. The first one involves thor- 
ough examination of all double predictions, i.e. all cases 
where the probability scores as defined in Example 10 

30 for two ORFs located on different reading frames are 
high and close, preferably different by less than 0.4. The 
fine examination of the region where the two possible 
ORFs overlap may help to detect the frameshift. In the 
second approach, homologies with known proteins arc 

35 used to correct suspected frameshifts. 

[01 85] Of the identified clusters, some were shown to 
be murtivariant, i.e. to contain several variants of the 
same gene. Table III gives for each of the multrvariant 
clusters named by its internal reference (first column), 

ao the list of all variant consensus contigated 5'ESTs (sec- 
ond column), each being represented by a different se- 
quence identification number. 

EXAMPLE 14 

45 

Categorization of 5' ESTs and Consensus Contigated 
5'ESTs 

[01 86] The nucleic acid sequences of the present in- 
50 vention (SEQ ID NOs. 24-3883 and 7744-19335) were 
grouped based on their homology to known sequences 
as follows. All sequences were compared to the nucleic 
add sequences of all vertebrates present in the EMBL 
release 58 and Genseqn (Derwenfs database of pat- 
55 ented nucleic acids) releases 35.3 or release 36. It 
should be noted that, because of the large number of 
sequences of the invention, the comparison of the poly- 
nucleotides of the invention to public sequences was 
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done In a time frame of 15 days, meaning that the first 
sequences were compared to Geneseqn release 35.3 
and the last ones with Geneseqn release 36. The com- 
parison to EMBL vertebrate sequences was performed 
after masking of repeated sequences and using blastn 
with the parameters S=1 08 and X=1 6 followed by fasta. 
The comparison to Geneseqn was performed after 
masking of repeated sequences and using blastn with 
the parameters S=1 08 and X=1 6. 
[01 87] Ail matches with a minimum of 30 nucleotides 
with 95% identity or 100% identity were retrieved and 
used to compute Tables I Va and I Vb respectively. Tables 
I Va and I Vb give for each sequence of the invention re- 
ferred to by its sequence identification number in the first 
column, the positions of their preferred fragments in the 
second column entitled 'Positions of preferred frag- 
ments." These pref eued fragments are novel fragments 
which do not match any publicly available vertebrate se- 
quence according to the algorithm, parameters and cri- 
teria defined above. As used herein the term "polynu- 
cleotides described in Tables IVa and IVb" refers to all 
of the preferred polynucleotide fragments defined in Ta- 
bles IVa and IVb in this manner. 
[01 88] The term "polynucleotides described in Tables 
IVa and IVb updated" refers to all of the preferred poly- 
nucleotide fragments defined in manner described 
above except that the most recent updates of the EMBL 
and Derwent databases arc used to define the preferred 
fragments as of the filing date of the instant application. 
[0189] The present invention encompasses isolated, 
purified, or recombinant nucleic acids which consist of, 
consist essentially of, or comprise a contiguous span of 
at least 8, 10, 12, 15, 18, 20, 25, 35, 40, 50, 70, 80, 100, 
250, or 500 nucleotides in length, to the extent that a 
contiguous span of these lengths is consistent with the 
lengths of the particular polynucleotide, of a polynucle- 
otide described In Tables IVa and IVb, or a sequence 
complementary thereto, wherein said polynucleotide 
described in Tables IVa and IVb is selected individually 
or in any combination from the polynucleotides de- 
scribed in Tables IVa and IVb. In particular, the present 
invention encompasses Isolated, purified, or recom- 
binant vertebrate nucleic acids which consist of, consist 
essentially of, or comprise a contiguous span of at least 
8, 10, 12, 15, 18, 20, 25, 35, 40, 50, 70, 80, 100, 250, 
or 500 nucleotides in length, to the extent that a contig- 
uous span of these lengths is consistent with the lengths 
of the particular polynucleotide, of a polynucleotide de- 
scribed in Tables IVa and IVb, or a sequence comple- 
mentary thereto, wherein said polynucleotide described 
in Tables IVa and IVb is selected individually or in any 
combination from the polynucleotides descnbed in Ta- 
bles IVa and IVb. In particular, the present invention en- 
compasses isolated, purified, or recombinant human 
nucleic acids which consist of, consist essentially of, or 
comprise a contiguous span of at least 8, 1 0, 12, 1 5, 1 8, 
20, 25, 35, 40, 50, 70, 80, 100, 250, or 500 nucleotides 
in length, to the extent that a contiguous span of these 



lengths is consistent with the lengths of the particular 
polynucleotide, of a polynucleotide described in Tables 
IVa and IVb, or a sequence complementary thereto, 
wherein said polynucleotide described in Tables IVa and 
$ IVb is selected individually or in any combination from 
the polynucleotides described in Tables IVa and IVb. 
[01 90] "The present invention also encompasses iso- 
lated, purified, or recombinant nucleic acids which com- 
prise, consist of, or consist essentially of a polynucle- 
10 otide described in Tables IVa and IVb, or a sequence 
complementary thereto, wherein said polynucleotide is 
selected individually or in any combination from the 
polynucleotides described in Tables IVa and IVb. In par- 
ticular, the present invention encompasses isolated, pu- 
tt rified, or recombinant vertebrate nucleic acids which 
comprise, consist of, or consist essentially of a polynu- 
cleotide described in Tables IVa and IVb, or a sequence 
complementary thereto, wherein said polynucleotide is 
selected individually or in any combination from the 
20 polynucleotides described in Tables IVa and IVb. In par- 
ticular, the present invention encompasses isolated, pu- 
rified, or recombinant human nucleic acids which com- 
prise, consist of, or consist essentially of a polynucle- 
otide described in Tables IVa and IVb, or a sequence 
25 complementary thereto, wherein said polynucleotide is 
selected individually or in any combination from the 
polynucleotides described in Tables IVa and IVb. 
[01 91 ] The present invention encompasses isolated, 
purified, or recombinant nucleic acids which consist of| 
30 consist essentially of, or comprise a contiguous span of 
at least 8, 10, 12, 15, 18, 20, 25, 35, 40, 50, 70, 80, 100, 
250, or 500 nucleotides in length, to the extent that a 
contiguous span of these lengths is consistent with the 
lengths of the particular polynucleotide, of a polynucle- 
35 otide described in Tables IVa and IVb, or a sequence 
complementary thereto, wherein said polynucleotide 
described in Tables IVa and IVb is selected individually 
or in any combination from the polynucleotides de- 
scribed in Tables IVa and IVb updated. In particular, the 
40 present invention encompasses isolated, purified, or re- 
combinant vertebrate nucleic acids which consist of, 
consist essentially of, or comprise a contiguous span of 
at least 8, 10, 12, 15, 18,20,25, 35,40, 50, 70, 80, 100, 
250, or 500 nucleotides in length, to the extent that a 
<* contiguous span of these lengths is consistent with the 
lengths of the particular polynucleotide, of a polynucle- 
otide described in Tables IVa and IVb, or a sequence 
complementary thereto, wherein said polynucleotide 
described in Tables IVa and IVb is selected individually 
50 or in any combination from the polynucleotides de- 
scribed in Tables IVa and IVb updated. In particular, the 
present invention encompasses isolated, purified, or re- 
combinant human nucleic acids which consist of, con- 
sist essentially of, or comprise a contiguous span of 
55 least 8, 10, 12, 15, 18, 20, 25, 35, 40, 50, 70, 80, 100, 
250, or 500 nucleotides in length, to the extent that a 
contiguous span of these lengths is consistent with the 
lengths of the particular polynucleotide, of a polynucle- 
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otide described In Tables IVa and IVb, or a sequence 
complementary thereto, wherein said polynucleotide 
described in Tables IVa and IVb is selected individually 
or in any combination from the polynucleotides de- 
scribed in Tables IVa and IVb updated. 5 
[0192] The present invention also encompasses iso- 
lated, purified, or recombinant nucleic acids which com- 
prise, consist of, or consist essentially of a polynucle- 
otide described in Tables IVa and IVb, or a sequence 
complementary thereto, wherein said polynucleotide is io 
selected individually or in any combination from the 
polynucleotides described in Tables IVa and IVb updat- 
ed. In particular, the present invention encompasses 
isolated, purified, or recombinant vertebrate nucleic ac- 
ids which consist of or consist essentially of a polynu- is 
cleotide described in Tables IVa and IVb, or a sequence 
complementary thereto, wherein said polynucleotide is 
selected individually or in any combination from the 
polynucleotides described in Tables IVa and IVb updat- 
ed. In particular, the present invention encompasses 20 
isolated, purified, or recombinant human nucleic acids 
which consist of or consist essentially of a polynucle- 
otide described in Tables IVa and IVb, or a sequence 
complementary thereto, wherein said polynucleotide is 
selected individually or in any combination from the & 
polynucleotides described in Tables IVa and IVb updat- 
ed. 

III. Evaluation of Spatial and Temporal Expression 

of mRNAa Corresponding to the 5'ESTs or Extended 30 

cDNAs """" 

[0193] Each of the SEQ ID NOs. 24-3883 and 
7744-1 9335 was also categorized based on the tissue 
from which its corresponding mRNA was obtained, as 35 
described below in Example 15. 

EXAMPLE 15 

Expression Patterns of mRNAs From Which the 5'ESTs 40 
were obtained 

[0194] Table V shows the spatial distribution of each 
nucleic acid sequence of the invention (SEQ ID NOs. 
24-3883 and 7744-1 9335) referred to by its internal des- 45 
ignatton in the first column and the number of individual 
5'ESTs used to assemble the consensus contigated 
5'ESTs per tissue in the second column. A singleton is 
thus represented by a single 5* EST from a single tissue. 
Each type of tissue listed in Table V is encoded by a so 
letter. The correspondence between the letter code and 
the tissue type is given in Table VI. For example, if the 
first colummn contains 47 and the second column con- 
tains the following list: A: 1 C:4 F:3, this means that the 
consensus contigated 5'EST of SEQ ID NO. 47 was ob- 55 
tained from one 5'EST from brain , four 5'ESTs from foe- 
tal kidney, and three 5'ESTs from liver. 
[01 95] The bias in spatial distribution of the sequenc- 



es of the invention was examined by comparing the rel- 
ative proportions of the biological 5'ESTs of a given tis- 
sue in each cluster using the following statistical analy- 
sis. The under- or over-representation of 5'ESTs of a giv- 
en cluster in a given tissue was performed using the nor- 
mal approximation of the binomial distribution. When the 
obscured proportion of 5'ESTs of a given tissue in a giv- 
en consensus had less than 1% chance to occur ran- 
domly according to the chi2 test, the frequency bias was 
reported as "low" or "high". The results are given in Table 
VII as follows. For each consensus contigated 5'ESTs 
showing a bias in tisue distribution as referred to by its 
sequence identification number in the first column, the 
list of tissues where some 5'ESTs were underrepresent- 
ed is given in the second column entitled "low frequen- 
cy" and the list of tissues where some 5'ESTs are over- 
represented fe given in the third column entitled "high 
frequency". 

[01 96] In addition to categorizing the 5' ESTs and con- 
sensus contigated 5' ESTs with respect to their tissue of 
origin, the spatial and temporal expression patterns of 
the mRNAs corresponding to the 5' ESTs and consen- 
sus contigated 5' ESTs, as well as their expression lev- 
els, may be determined as described in Example 1 6 be- 
low. 

[0197] Characterization of the spatial and temporal 
expression patterns and expression levels of these mR- 
NAs is useful for constructing expression vectors capa- 
ble of producing a desired level of gene product in a de- 
sired spatial or temporal manner, as will be discussed 
in more detail below. 

[01 98] Furthermore, 5' ESTs and consensus contigat- 
ed 5' ESTs whose corresponding mRNAs are associat- 
ed with disease states may also be identified. For ex- 
ample, a particular disease may result from the lack of 
expression, over expression, or under expression of a 
mRNA corresponding to a 5' EST or consensus conti- 
gated 5' EST By comparing mRNA expression patterns 
and quantities in samples taken from healthy individuals 
with those from individuals suffering from a particular 
disease, 5' ESTs or consensus contigated 5' ESTs re- 
sponsible for the disease may be identified. 
[0199] ft will be appreciated that the results of the 
above characterization procedures for 5' ESTs and con- 
sensus contigated 5* ESTs also apply to extended cD- 
NAs (obtainable as described below) which contain se- 
quences adjacent to the 5' ESTs and consensus conti- 
gated 5' ESTs. It will also be appreciated that if desired, 
characterization may be delayed until extended cDNAs 
have been obtained rather than characterizing the 5' 
ESTs or consensus contigated 5' ESTs themselves. 

EXAMPLE 16 

Evaluation of Expression Levels and Patterns of 
mRNAs Corresponding to EST-Related Nucleic Acids 

[0200] Expression levels and patterns of mRNAs cor- 
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responding to EST-related nucleic acids may be ana- 
lyzed by solution hybridization with long probes as de- 
scribed in International Patent Application No. WO 
97/05277, the entire contents of which are hereby incor- 
porated by reference Briefly, an EST-related nucleic ac- 
id, fragment of an EST related nucleic acid, positional 
segment of an EST-related nucleic acid, or fragment of 
a positional segment of an EST-related nucleic acid cor- 
responding to the gene encoding the mRNA to be char- 
acterized is inserted at a cloning site immediately down- 
stream of a bacteriophage (T3, T7 or SP6) RNA 
polymerase promoter to produce antisense RNA. Pref- 
erably, the EST-rclated nucleic acid, fragment of an EST 
related nucleic acid, positional segment of an EST-re- 
lated nucleic acid, or fragment of a positional segment 
of an EST-related nucleic acid is 100 or more nucle- 
otides in length. The plasmid is linearized and tran- 
scribed in the presence of ribonucleotides comprising 
modified ribonucleotides (i.e. biotin-UTP and DIG-UTP). 
An excess of this doubly labeled RNA is hybridized in 
solution with mRNA isolated from cells or tissues of in- 
terest. The hybridizations are performed under standard 
stringent conditions (40-50°C for 16 hours in an 80% 
formamide, 0.4 M NaCI buffer, pH 7-8). The unhybrid- 
ized probe is removed by digestion with ribonucleases 
specific for single-stranded RNA (i.e. RNases CL3, Tl, 
Phy M, U2 or A). The presence of the biotin-UTP mod- 
ification enables capture of the hybrid on a microtitration 
plate coated with streptavidin. The presence of the DIG 
modification enables the hybrid to be detected and 
quantified by ELISA using an anti-DIG antibody coupled 
to alkaline phosphatase. 

[0201] The EST-related nucleic acid, fragment of an 
EST related nucleic acid, positional segment of an EST- 
related nucleic acid, or fragment of a positional segment 
of an EST-related nucleic acid may also be tagged with 
nucleotide sequences for the serial analysis of gene ex- 
pression (SAGE) as disclosed in UK Patent Application 
No. 2 305 241 A, the entire contents of which are incor- 
porated by reference. In this method, cDNAs are pre- 
pared from a cell, tissue, organism or other source of 
nucleic acid for which gene expression patterns must 
be determined. The resulting cDNAs are separated into 
two pools. The cDNAs in each pool are cleaved with a 
first restriction endonuclease, called an anchoring en- 
zyme, having a recognition site which is likely to be 
present at least once in most cDNAs. The fragments 
which contain the 5' or 3' most region of the cleaved cD- 
NA are isolated by binding to a capture medium such as 
streptavidin coated beads. A first oligonucleotide linker 
having a first sequence for hybridization of an amplifi- 
cation primer and an internal restriction site for a so 
called tagging endonuclease is ligated to the digested 
cDNAs in the first pool. Digestion with the second endo- 
nuclease produces short tag fragments from the cDNAs. 
[0202] A second oligonucleotide having a second se- 
quence for hybridization of an amplification primer and 
an internal restriction site is ligated to the digested cD- 



NAs in the second pool. The cDNA fragments in the sec- 
ond pool are also digested with the tagging endonucle- 
ase to generate short tag fragments derived from the 
cDNAs in the second pool. The tags resulting from di- 
5 gestion of the first and second pools with the anchoring 
enzyme and the tagging endonuclease are ligated to 
one another to produce so called ditags. In some em- 
bodiments, the ditags are concatamerized to produce 
ligation products containing from 2 to 200 ditags. The 
io tag sequences are then determined and compared to 
the sequences of the EST-related nucleic acid, fragment 
of an EST related nucleic acid, positional segment of an 
EST-related nucleic acid, or fragment of a positional 
segment of an EST-related nucleic acid to determine 
is which 5* ESTs, consensus contigated 5' ESTs, or ex- 
tended cDNAs arc expressed in the cell, tissue, organ- 
ism, or other source of nucleic acids from which the tags 
were derived. In this way, the expression pattern of the 
5' ESTs, consensus contigated 5* ESTs, or extended cD- 
20 NAs in the cell, tissue, organism, or other source of nu- 
cleic adds is obtained. 

[0203] Quantitative analysis of gene expression may 
also be performed using arrays. As used herein, the 
term array means a one dimensional, two dimensional, 
25 ormultidimensional arrangement of EST-related nucleic 
acids, fragments of EST related nucleic acids, positional 
segments EST-related nucleic acids, orfragments of po- 
sitional segments of EST-related nucleic acids. Prefer- 
ably, the EST-related nucleic acids, fragments of EST 
30 related nucleic acids, positional segments EST-related 
nucleic adds, or fragments of positional segments of 
EST-related nudeic acids are at least 15 nudeotides in 
length. More preferably, the EST-related nucleic acids, 
fragments of EST related nucleic acids, positional seg- 
35 ments ESTrelated nudeic acids, or fragments of posi- 
tional segments of EST-related nucleic acids are at least 
1 00 nudeotide long. More preferably, the fragments are 
more than 100 nucleotides in length. In some embodi- 
ments, the EST-related nucleic acids, fragments of EST 
^o related nucleic adds, positional segments EST-related 
nucleic acids, or fragments of positional segments of 
EST-related nudeic adds may be more than 500 nucle- 
otides long. 

[0204] For example, quantitative analysis of gene ex- 
^5 pression may be performed with EST-related nucleic ac- 
ids, fragments of EST related nucleic acids, positional 
segments EST-related nucleic acids, orfragments of po- 
sitional segments of EST-related nucleic adds in a com- 
plementary DNA microarray as described by Schena et 
so al. (Science 270:467-470, 1 995; Proc. Natl. Acad. Sci. 
USA. 93:1 061 4-1 061 9, 1 996). EST-related nucleic ac- 
ids, fragments of EST related nucleic acids, positional 
segments EST-related nucleic adds, orfragments of po- 
sitional segments of EST-related nucleic acids are am- 
55 plif ied by PCR and arrayed from 96-well microtiter plates 
onto silylated microscope slides using high-speed ro- 
botics. Printed arrays are incubated in a humid chamber 
to allow rehydration of the array elements and rinsed, 
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once in 02% SDS for i min, twice in water for 1 min and 
once for 5 min in sodium borohydride solution. The ar- 
rays are submerged in water for 2 min at 95°C, trans- 
ferred into 02% SDS for 1 min, rinsed twice with water, 
air dried and stored in the dark at 25°C. 
[0205] Cell or tissue m RN A is isolated or commercial- 
ly obtained and probes are prepared by a single round 
of reverse transcription. Probes are hybridized to 1 cm 2 
microarrays under a 1 4 x 1 4 mm glass coverslip for 6-1 2 
hours at 60°C. Arrays are washed for 5 min at 25°C in 
low stringency wash buffer (1 x SSC/0.2% SDS), then 
for 1 0 min at room temperature in high stringency wash 
buffer (0.1 x SSC/02% SDS). Arrays are scanned in 0.1 
x SSC using a fluorescence laser scanning device fitted 
with a custom filter set. Accurate differential expression 
measurements are obtained by taking the average of 
the ratios of two independent hybridizations. 
[0206] Quantitative analysis of the expression of 
genes may also be performed with EST-related nucleic 
acids, fragments of EST related nucleic acids, positional 
segments EST-related nucleic acids, orfragments of po- 
sitional segments of EST-related nucleic acids in com- 
plementary DNA arrays as described by Pietu era/. (Ge- 
nome Research 6:492-503, 1996). The EST-related nu- 
cleic acids, fragments of EST related nucleic acids, po- 
sitional segments EST-related nucleic adds, or frag- 
ments of positional segments of EST-related nucleic ac- 
ids thereof are PCR amplified and spotted on mem- 
branes. Then, mRNAs originating from various tissues 
or cells are labeled with radioactive nucleotides. After 
hybridization and washing in controlled conditions, the 
hybridized mRNAs are detected by phospho-imaging or 
autoradiography. Duplicate experiments are performed 
and a quantitative analysis of differentially expressed 
mRNAs is then performed. 

[0207] Alternatively, expression analysis of the EST- 
related nucleic acids, fragments of EST related nucleic 
acids, positional segments EST-related nucleic acids, or 
fragments of positional segments of EST-related nucleic 
acids can be done through high density, nucleotide ar- 
rays as described by Lockhart et al. {Nature Biotechnoh 
ogy 14:1 675-1 680, 1996) and Sosnowsky et al. (Proc. 
Natl. Acad. Set. 94:11 19-1123, 1997). Oligonucleotides 
of 15-50 nucleotides corresponding to sequences of 
EST-related nucleic acids, fragments of EST related nu- 
cleic acids, positional segments EST-related nucleic ac- 
ids, orfragments of positional segments of EST-related 
nucleic acids are synthesized directly on the chip (Lock- 
hart eta}., supra) or synthesized and then addressed to 
the chip (Sosnowsky et al., supra). Preferably, the oli- 
gonucleotides are about 20 nucleotides in length. 
[0208] cDNA probes labeled with an appropriate com- 
pound, such as biotin, digoxigenin or fluorescent dye, 
are synthesized from the appropriate mRNA population 
and then randomly fragmented to an average size of 50 
to 1 00 nucleotides. The said probes are then hybridized 
to the chip. After washing as described in Lockhart etal t 
supra and application of different electric fields 



(Sonowsky et al, supra.), the dyes or labeling com- 
pounds are detected and quantified. Duplicate hybridi- 
zations are performed. Comparative analysis of the in- 
tensity, of the signal originating from cONA probes on 
5 the same target oligonucleotide in different cDNA sam- 
ples indicates a differential expression of the mRNA cor- 
responding to the 5' EST, consensus contigated 5* EST 
or extended cDNA from which the oligonucleotide se- 
quence has been designed. 

w 

IV. Use of 5' ESTs or Consensus Contigated 5'ESTs 
to Clone Extended cDNAs and to Clone the 
Corresponding Genomic DNAs 

[0209] Once 5' ESTs or consensus contig ated 5' ESTs 
which include the 5* end of the corresponding mRNAs 
have been selected using the procedures described 
above, they can be utilized to isolate extended cDNAs 
which contain sequences adjacent to the 5' ESTs orcon- 

20 sensus contigated 5* ESTs. The extended cDNAs may 
include the entire coding sequence of the protein encod- 
ed by the corresponding mRNA, including the authentic 
translation start site. If the extended cDNA encodes a 
secreted protein, it may contain the signal sequence, 

25 and the sequence encoding the mature protein remain- 
ing after cleavage of the signal peptide. Extended cD- 
NAs which include the entire coding sequence of the 
protein encoded by the corresponding mRNA are re- 
ferred to herein as "full-length cDN As." Alternatively, the 

30 extended cDNAs may not include the entire coding se- 
quence of the protein encoded by the corresponding 
mRNA, although they do include sequences adjacent to 
the 5'ESTs or consensus contigated 5' ESTs. In some 
embodiments in which the extended cDNAs are derived 

35 from an mRNA encoding a secreted protein, the extend- 
ed cDNAs may include only the sequence encoding the 
mature protein remaining after cleavage of the signal 
peptide, or only the sequence encoding the signal pep- 
tide. 

40 {021 0] Example 17 below describes a general PCR 
based method for obtaining extended cDNAs using 5' 
ESTs or consensus contigated 5* ESTs. Example 1 8 de- 
scribes hybridization based methods to obtain genomic 
DNAs which encode the mRNAs from which the 5' ESTs 

45 or consensus contigated 5' ESTS were derived, mRNAs 
from which the 5' ESTs or consensus contigated 5* 
ESTS were derived, or nucleic acids which are homol- 
ogous to 5' ESTs-related nucleic acids. Example 1 9 be- 
low describes the cloning and sequencing of several ex- 

50 tended cDNAs, including extended cDNAs which in- 
clude the entire coding sequence and authentic 5' end 
of the corresponding mRNA for several secreted pro- 
teins. 

[021 1 ] The methods of Examples 1 7 and 1 B can also 
55 be used to obtain extended cDNAs which encode less 
than the entire coding sequence of proteins encoded by 
the genes corresponding to the 5' ESTs or consensus 
contigated ESTs. In some embodiments, the extended 



25 



49 



EP 1 104 808 A1 



50 



cDNAs isolated using these methods encode at least 
5,1 0, 1 5, 20, 25, 30, 35, 40, 50, 75, 1 00, or 150 consec- 
utive amino acids of one of the proteins encoded by the 
sequences of SEQ ID NOs. 24-3883 and 7744-19335. 
In some embodiments, the extended cDNAs isolated 
using these methods encode at least 5, 10, 15, 20, 25, 
30, 35, 40, 50, 75, 1 00, or 1 50 consecutive amino acids 
of one of the proteins encoded by the sequences of SEQ 
ID NOs. 24-3883. 

EXAMPLE 17 

General Method for Using 5' ESTs or Consensus 
Contigated S'ESTs to Clone and Sequence Extended 
cDNAs which Include the Entire Coding Region and the 
Authentic 5'End of the Corresponding mRNA 

[0212] The following general method may be used to 
quickly and efficiently isolate extended cDNAs including 
sequence adjacent to the sequences of the 5' ESTs or 
consensus contigated 5'ESTs used to obtain them. This 
method may be applied to obtain extended cDNAs for 
any 5" EST or consensus contigated 5' EST of the in- 
vention, including those 5' ESTs and consensus conti- 
gated 5' ESTs encoding secreted proteins. This method 
is illustrated in Figure 3. 

1. Obtaining Extended cDNAs 

[0213] The method takes advantage of the known 5' 
sequence of the mRNA. A reverse transcription reaction 
is conducted on purified mRNA with a poly oT primer 
containing a nucleotide sequence at its 5' end allowing 
the addition of a known sequence at the end of the cDNA 
which corresponds to the 3' end of the mRNA. Such a 
primer and a commercially-available reverse tran- 
scriptase enzyme are added to a buffered mRNA sam- 
ple yielding a reverse transcript anchored at the 3* polyA 
site of the RNAs. Preferably, a thermostable enzyme is 
used. Nucleotide monomers are then added to complete 
the first strand synthesis. 

[0214] After removal of the mRNA hybridized to the 
first cDNA strand by alkaline hydrolysis, the products of 
the alkaline hydrolysis and the residual poly dT primer 
can be eliminated with an exclusion column. 
[021 5] Subsequently, a pair of nested primers on each 
end of the cDNA to be amplified is designed based on 
the known 5' sequence from the 5' EST or consensus 
contigated 5' EST and the known 3' end added by the 
poly oT primer used in the first strand synthesis. Soft- i 
ware used to design primers are either based on GC 
content and melting temperatures of oligonucleotides, 
such as OSP (lllier and Green, PCR Meth. Appl. 1: 
124-128, 1991), or based on the octamcr frequency dis- 
parity method (Griffais ef a/., Nucleic Acids Res. 19: 5 
3887-3891, 1991 ) such as PC-Rare (http7/bioinformat- 
ics.weizmam.ac. il/software/PC-Rare/doc/manuel. 
html). Preferably, the nested primers at the 5' end and 



the nested primers at the 3' end are separated from one 
another by four to nine bases. These primer sequences 
may be selected to have melting temperatures and spe- 
cificities suitable for use in PCR. 
s [0216] A first PCR run is performed using the outer 
primerfrom each of the nested pairs. A second PCR run 
using the inner primer from each of the nested pairs is 
then performed on a small sample of the first PCR prod- 
uct. Thereafter, the primers and remaining nucleotide 
io monomers are removed. 

[0217] It will be appreciated that a simple PCR reac- 
tion using a primer on each end of the cDNA to be am- 
plified may also be performed rather than using a couple 
of primers for in a nested PCR procedure. However be- 
'5 cause of the possibility of PCR artifacts In this method, 
a nested PCR protocol is preferred. 

2. Sequencing Extended cDNAs or Fragments Thereof 

» [0218] Due to the lack of position constraints on the 
design of 5' nested primers compatible for PCR use us- 
ing the OSP software, amplicons of two types are ob- 
tained. Preferably, the second 5' primer is located up- 
stream of the translation initiation codon thus yielding a 
s nested PCR product containing the entire coding se- 
quence. Such an extended cDNA may be used in a di- 
rect cloning procedure as described in section a below. 
However, in some cases, the second 5 1 primer is located 
downstream of the translation initiation codon, thereby 
o yielding a PCR product containing only part of the ORE 
Such incomplete PCR products are submitted to a mod- 
ified procedure described in section b below. 

a) Nested PCR products containing complete ORFs 

5 

[0219] When the resulting nested PCR product con- 
tains the complete coding sequence, as predicted from 
the 5'EST or consensus contigated 5 1 EST sequence, it 
is directly cloned in an appropriate vector as described 
40 in section 3. 

b) Nested PCR products containing incomplete ORFs 

[0220] When the amplicon does not contain the corn- 
's plete coding sequence, intermediate steps are neces- 
sary to obtain both the complete coding sequence and 
a PCR product containing the full coding sequence. The 
complete coding sequence can be assembled from sev- 
eral partial sequences determined directly from different 
» PCR products. 

[0221] Once the full coding sequence has been com- 
pletely determined, new primers compatible for PCR 
use are then designed to obtain amplicons containing 
the whole coding region. However, in such cases, 3' 
5 primers compatible for PCR use are located inside the 
3' UTR of the corresponding mRNA, thus yielding am- 
plicons which lack part of this region, i.e. the polyA tract 
and sometimes the polyadenylation signal, as illustrated 
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In Figure 3. Such extended cDNAs are then cloned into 
an appropriate vector as described in section 3. 

c) Sequencing extended cDNAs 

5 

[0222] Sequencing of extended cONAs can be per- 
formed using a Dye Terminator approach with the Am- 
plfTaq DNA polymerase FS kit available from Perkin 
Elmer. 

[0223] In order to sequence long PCR fragments, to 
primer walking is performed using software such as 
OSP to choose primers and automated computer soft- 
ware such as ASMG (Sutton et al., Genome Science 
Techno!. 1:9-19, 1995) to construct contigs of walking 
sequences including the initial 5* tag. Preferably, primer « 
walking is performed until the sequences of full length 
cDNAs are obtained. 

[0224] Completion of the sequencing of a given ex- 
tended cDN A fragment may be assessed by comparing 
the sequence length to the size of the corresponding 20 
nested PCR product. When Northern blot data are avail- 
able, the size of the mRNA detected for a given PCR 
product may also be used to finally assess that the se- 
quence is complete. Sequences which do not fulfill 
these criteria are discarded and will undergo a new iso- 25 
lation procedure. 

3. Cloning Extended cDNAs 

[0225] The PCR product containing the full coding se- 30 
quence is then cloned in an appropriate vector. For ex- 
ample, the extended cDNAs can be cloned into any ex- 
pression vector known in the art, such as pED6dpc2 for 
extended cDNA encoding potentially secreted proteins 
(DiscoverEase, Genetics Institute, Cambridge, MA). 35 
[0226] Cloned PCR products are then entirely se- 
quenced in order to obtain at least two sequences per 
clone. Preferably, the sequences are obtained from both 
sense and antisense strands according to the afore- 
mentioned procedure with the following modifications. 40 
First, both 5' and 3" ends of cloned PCR products are 
sequenced in order to confirm the identify of the clone. 
Second, primer walking is performed if the full coding 
region has not been obtained yet. Contigation is then 
performed using primer walking sequences for cloned 45, 
products as well as walking sequences that have al- 
ready contigated for uncloned PCR products. The se- 
quence is considered complete when the resulting con- 
tigs include the whole coding region as well as overlap- 
ping sequences with vector DNA on both ends. All the so 
contigated sequences for each cloned ampltcon are 
then used to obtain a consensus sequence. 

4. Selection of Cloned Full length Sequences 

55 

a) Computer analysis of extended cDNAs 

[0227] Following identification of contaminants and 



masking of repeats, structural features, e.g. polyA tail 
and poryadenylation signal, of the sequences of extend- 
ed cDNAs are subsequently determined using methods 
known to those skilled in the art For example, the algo- 
rithm, parameters and criteria defined in Figure 10 may 
be used. Briefly, a poly A tail is defined as a homopoly- 
meric stretch of at least 1 1 A with at most one alternative 
base within it. The polyA tail search is restricted to the 
last 20 nucleotides of the sequence and limited to 
stretches of 11 consecutive A's because sequencing re- 
actions are often not readable after such a polyA stretch. 
To search for a poryadenylation signal, the polyA tail is 
clipped from the full-length sequence. The 50 nucle- 
otides preceding the polyA tail are searched for the ca- 
nonic poryadenylation AAUAAA signal allowing one 
mismatch to account for possible sequencing errors as 
well as known variation in the canonical sequence of the 
poryadenylation signal. 

[0228] Functional features, e.g. ORFs and signal se- 
quences, of the sequences of extended cDNAs are sub- 
sequently, determined as follows. The 3 upper strand 
frames of extended cDNAs arc searched for ORFs de- 
fined as the maximum length fragments beginning with 
a translation initiation codon and ending with a stop co- 
don. ORFs encoding at least 80 amino acids are pre- 
ferred. If extended cDNAs encoding secreted proteins 
are desired, each identified ORF is then scanned forthe 
presence of a signal peptide using the matrix method 
described in Example 11 . 

{0229] Sequences of extended cDNAs are then com- 
pared, on a nucleotide or proteic basis, to public se- 
quences available at the time of filing. 

b) Selection of full-length cDNAs of interest 

[0230] A negative selection may then be performed in 
orderto eliminate unwanted cloned sequences resulting 
from either contaminants or PCR artifacts as follows. 
Sequences matching contaminant sequences such as 
vector DNA, tRNA, mtRNA, rRNA sequences are dis- 
carded as well as those encoding ORF sequences ex- 
hibiting extensive homologue to repeats. 
[0231 ] Sequences obtained by direct cloning (section 
1 a) but lacking polyA tail may be discarded. Only ORFs 
ending either before the polyA tail (section 1 a) or before 
the end of the cloned 3'UTR (section 1 b) may be select- 
ed. If extended cDNAs encoding secreted proteins are 
desired, ORFs containing a signal peptide are consid- 
ered. In addition, ORFs containing unlikely mature pro- 
teins such as mature proteins which size is less than 20 
amino acids or less than 25% of the immature protein 
size may be eliminated if necessary. 
[0232] Then, for each remaining full length cDNA con- 
taining several ORFs, a preselection of ORFs may be 
performed using the following criteria. The longest ORF 
is preferred. If extended cDNAs encoding secreted pro- 
teins are desired and if the ORF sizes are similar, the 
chosen ORF is the one which signal peptide has the 
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highest score according to Von Heijne method. 
[0233] Sequences of full length cDNA clones may 
then be compared pairwise after masking of the repeat 
sequences. Full-length cDNA sequences exhfoiting ex- 
tensive homology may be clustered in the same class. 
Each cluster may then be subjected to a cluster analysis 
that detects sequences resulting from internal priming 
or from alternative splicing, identical sequences or se- 
quences with several frameshifts. A selection may be 
operated between clones belonging to the same class 
in order to detect clones encoding homologous but dis- 
tinct ORFs which may be both selected if they both con- 
tain sequences of interest. 

[0234] Selection of full-length cON A clones encoding 
sequences of interest may subsequently be performed 
using the following criteria. Structural parameters (initial 
tag, polyadenylation site and signal) are first checked. 
Then, homologies with known nucleic acids and pro- 
teins are examined in order to determine whether the 
clone sequence match a known nucleotide/protein se- 
quence and, in the latter case, its covering rate and the 
date at which the sequence became public. If there is 
no extensive match with sequences other than ESTs or 
genomic DNA, or if the clone sequence provides sub- 
stantial new information, such as encoding a protein re- 
sulting from alternative splicing of an mRNA coding for 
an already known protein, the sequence is kept. Exam- 
ples of such cloned full-length cDNAs containing se- 
quences of interest are described in Example 19. Se- 
quences resulting from chimera or double inserts or lo- 
cated on chromosome breaking points as assessed by 
homology to other sequences may be discarded during 
this procedure. 

[0235] Extended cDNAs prepared as described 
above may be subsequently engineered to obtain nu- 
cleic acids which include desired portions of the extend- 
ed cDNA using conventional techniques such as sub- 
cloning, PCR, or in vitro oligonucleotide synthesis. For 
example, nucleic acids which include only the full coding 
sequences may be obtained using techniques known to 
those skilled in the art. Alternatively, conventional tech- 
niques may be applied to obtain nucleic acids which 
contain only part of the coding sequences. In the case 
of nucleic acids encoding secreted proteins, nucleic ac- 
ids containing only the coding sequence for the mature 
protein remaining after the signal peptide is cleaved off 
or nucleic acids which contain only the coding sequenc- 
es for the signal peptides may be obtained. 
[0236] Similarly, nucleic acids containing any other 
desired portion ofthe coding sequences for the encoded 
protein may be obtained. For example, the nucleic acid 
may contain at least 10, 15, 18, 20, 25, 28, 30, 35, 40, 
50, 75, 100, 150, 200, 300, 400, 500, 1 000 or 2000 con- 
secutive bases of an extended cDNA. 
[0237] Once an extended cDNA has been obtained, 
it can be sequenced to determine the amino acid se- 
quence it encodes. Once the encoded amino acid se- 
quence has been determined, one can create and iden- 



tify any of the many conceivable cONAs that will encode 
that protein by simply using the degeneracy of the ge- 
netic code. For example, allelic variants or other homol- 
ogous nucleic acids can be identified as described be- 
s low. Alternatively, nucleic acids encoding the desired 
amino acid sequence can be synthesized in vitro. 
[0238] In a preferred embodiment, the coding se- 
quence may be selected using the known codon or co- 
don pair preferences for the host organism in which the 
to cDNA is to be expressed. 

[0239] In addition to PCR based methods for obtain- 
ing cDNAs which include the authentic 5' end of the cor- 
responding mRNA as weil as the complete protein cod- 
ing sequence of the corresponding mRNA, traditional 
*s hybridization based methods may also be employed. 
These methods may also be used to obtain the genomic 
DNAs which encode the mRNAs from which the 5' ESTs 
or consensus contigated5' ESTS were derived, mRNAs 
from which the 5' ESTs or consensus contigated 5' 
20 ESTS were derived, or nucleic acids which are homol- 
ogous to EST-related nucleic acids. In particular, such 
methods may be used to obtain extended cDNAs which 
include the entire coding region of the mRNAs from 
which the 5'EST or consensus contigated 5'ESTs was 
25 derived. Example 1 8 below provides examples of such 
methods. 

EXAMPLE 18 

30 Methods for Obtaining Extended cDNAs which Include 
the Entire Coding Recjon and the Authentic 5' End ofthe 
Corresponding mRNA or Nucleic Acids Homologous to 
Extended cDNAs. 5' ESTs or Consensus Contigated 5' 
ESTs 



[0240] A full-length cDNA library can be made using 
the strategies described in Examples 1-5. Alternatively, 
a cDNA library or genomic DNA library may be obtained 
from a commercial source or made using techniques fa- 
40 miliar to those skilled in the art. 

[0241 ] Such cDNA or genomic DNA libraries may be 
used to isolate extended cDNAs obtained from 5' ESTs 
or consensus contigated 5' ESTs or nucleic acids ho- 
mologous to extended cDNAs, 5* ESTs, or consensus 
45 contigated 5' ESTs as follows. The cDNA library or ge- 
nomic DNA library is hybridized to a detectable probe. 
The detectable probe may comprise at least 1 0, 1 5, 1 8, 
20, 25, 28, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400 
or500 consecutive nucleotides of the 5' EST, consensus 
so contigated 5' EST, or extended cDNA. Techniques for 
identifying cDNA clones in a cDNA library which hybrid- 
ize to a given probe sequence are disclosed in Sam- 
brook ef a/., Molecular Cloning: A Laboratory Manual 2d 
Ed., Cold Spring Harbor Laboratory Press, 1989, the 
55 disclosure of which is incorporated herein by reference. 
The same techniques may be used to isolate genomic 
DNAs. 

[0242] Briefly, cDNA or genomic DNA clones which 
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hybridize to the detectable probe are identified and iso- 
lated for further manipulation as follows. The detectable 
probe described in the preceding paragraph is labeled 
with a detectable label such as a radioisotope or a fluo- 
rescent molecule. Techniques for labeling the probe are 
well known and include phosphorylation with polynucle- 
otide kinase, nick translation, in vitro transcription, and 
non radioactive techniques. The cDNAs or genomic 
DNAs in the library are transferred to a nitrocellulose or 
nylon filter and denatured. After blocking of non specific 
sites, the fitter is incubated with the labeled probe for an 
amount of time sufficient to allow binding of the probe 
to cDNAs or genomic DNAs containing a sequence ca- 
pable of hybridizing thereto. 

[0243] By varying the stringency of the hybridization 
conditions used to identify cDNAs or genomic DNAs 
which hybridize to the detectable probe, cDNAs or ge- 
nomic DNAs having different levels of homology to the 
probe can be identified and isolated as described below. 

1 . Identification of cDNA or Genomic DNA Sequences 
Having a High Degree of Homology to the Labeled 
Probe 

[0244] To identify cDNAs or genomic DNAs having a 
high degree of homology to the probe sequence, the 
melting temperature of the probe may be calculated us- 
ing the following formulas: 

[0245] For probes between 14 and 70 nucleotides in 
length the melting temperature (Tm) is calculated using 
the formula: Tm=81 .5+1 6.6(log (Na+)}+0.41 (fraction 
G+C)-(600/N) where N is the length of the probe. 
[0246] If the hybridization is carried out in a solution 
containing formamide, the melting temperature may be 
calculated using the equation Tm=81 .5+1 6.6(log (Na+)) 
+0.41 (fraction G+C)-(0.63% formamide)-(600/N) where 
N is the length of the probe. 

[0247] Prehybridization may be carried out in 6X SSC, 
5X Denhardfs reagent. 0.5% SDS, 100 u,g denatured 
fragmented salmon sperm DNA or 6X SSC, 5X Den- 
hardfs reagent, 0.5% SDS, 1 00 u.g denatured fragment- 
ed salmon sperm DNA, 50% formamide. The formulas 
for SSC and Denhardfs solutions are listed in Sambrook 
etai. t supra. 

[0248] Hybridization is conducted by adding the de- 
tectable probe to the prehybridization solutions listed 
above. Where the probe comprises double stranded 
DNA, it is denatured before addition to the hybridization 
solution. The filter is contacted with the hybridization so- 
lution for a sufficient period of time to allow the probe to 
hybridize to extended cDN As or genomic DNAs contain- 
ing sequences complementary thereto or homologous 
thereto. For probes over 200 nucleotides in length, the 
hybridization may be carried out at 15-25°C below the 
Tm. For shorter probes, such as oligonucleotide probes, 
the hybridization may be conducted at 15-25°C below 
theTm. Preferably, for hybridizations in 6X SSC, the hy- 
bridization is conducted at approximately 68°C. Prefer- 



ably, for hybridizations in 50% formamide containing so- 
lutions, the hybridization is conducted at approximately 
42°C. 

[0249] All of the. foregoing hybridizations would be 
5 considered to be under "stringent" conditions. 

[0250] Following hybridization, the filter is washed in 
2X SSC, 0.1 % SDS at room temperature for 1 5 minutes. 
The filter is then washed with 0.1 X SSC, 0.5% SDS at 
room temperature for 30 minutes to 1 hour. Thereafter, 
10 the solution is washed at the hybridization temperature 
in 0.1 X SSC, 0.5% SDS. A final wash is conducted in 
0.1 X SSC at room temperature. 
[0251] cDNAs or genomic DNAs which have hybrid- 
ized to the probe are identified by autoradiography or 
is other conventional techniques. 

2. Obtaining cDNA or Genomic DNA Sequences Having 
Lower Degrees of Homology to the Labeled Probe 

20 [0252] The above procedure may be modified to iden- 
tify cDNAs or genomic DNAs having decreasing levels 
of homology to the probe sequence. For example, to ob- 
tain cDNAs or genomic DNAs of decreasing homology 
to the detectable probe, less stringent conditions may 

25 be used. For example, the hybridization temperature 
may be decreased in increments of 5 V C from 68°C to 
42*C in a hybridization buffer having a sodium concen- 
tration of approximately 1 M . Following hybridization , the 
filter may be washed with 2X SSC, 0.5% SDS atthetem- 

30 perature of hybridization. These conditions are consid- 
ered to be "moderate" conditions above 50°C and "low" 
conditions below 50°C. 

[0253] Alternatively, the hybridization may be carried 
out in buffers, such as 6X SSC, containing formamide 

35 at a temperature of 42°C. In this case, the concentration 
of formamide in the hybridization buffer may be reduced 
in 5% increments from 50% to 0% to identify clones hav- 
ing decreasing levels of homology to the probe. Follow- 
ing hybridization, the filter may be washed with 6X SSC, 

40 0.5% SDS at 50°C. These conditions are considered to 
be "moderate" conditions above 25% formamide and 
"low" conditions below 25% formamide. 

cDNAs or genomic DNAs which have hybridized to the 
45 probe are identified by autoradiography 

3. Determination of the Degree of Homology between 
the Obtained cDNAs or Genomic DNAs and 5'ESTs, 
Consensus Contigated 5'ESTs, or Extended cDNAs or 

so Between the Polypeptides Encoded by the Obtained 
cDNAs or Genomic DNAs and the Polypeptides 
Encoded by the 5'ESTs, Consensus Contigated 5'ESTs, 
or Extended cDNAs 

55 [0254] To determine the level of homology between 
the hybridized cDNA or genomic DNA and the 5'EST, 
consensus contigated 5'EST or extended cDNA from 
which the probe was derived, the nucleotide sequences 
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of the hybridized nucleic add and the 5'EST, consensus 
contigated 5'EST or extended cDNA from which the 
probe was derived are compared. The sequences of the 
5'EST, consensus contigated 5'EST or extended cDNA 
from which the probe was derived and the sequences 
of the cDNA or genomic DNA which hybndized to the 
detectable probe may be stored on a computer readable 
medium as described below and compared to one an- 
other using any of a variety of algorithms familiar to 
those skilled in the art, those described below. 
[0255] To determine the level of homology between 
the polypeptide encoded by the hybridizing cDNA or ge- 
nomic DNA and the polypeptide encoded by the 5'EST, 
consensus contigated 5'EST or extended cDNA from 
which the probe was derived, the polypeptide sequence 
encoded by the hybridized nucleic acid and the polypep- 
tide sequence encoded by the 5'EST, consensus conti- 
gated 5'EST or extended cDNA from which the probe 
was derived are compared. The sequences of the 
polypeptide encoded by the 5'EST, consensus conti- 
gated 5'EST or extended cDMA from which the probe 
was derived and the polypeptide sequence encoded by 
the cDNA or genomic DNA which hybridized to the de- 
tectable probe may be stored on a computer readable 
medium as described below and compared to one an- 
other using any of a variety of algorithms familiar to 
those skilled in the art, those described below. 
[0256] Protein and/or nucleic acid sequence homolo- 
gies may be evaluated using any of the variety of se- 
quence comparison algorithms and programs known in 
the art. Such algorithms and programs include, but are 
by no means limited to, TBLASTN, BLASTP, FASTA, 
TFASTA, and CLUSTALW (Pearson and Lipman, 1988, 
Proc. Natl. Acad, Sd USA 55^:2444-2448; Altschul et 
al., 1990, J. Mol. Biol. 2/5^:403-41 0; Thompson etal., 
1994, Nudeh Acids Res. 22(2^4673-4680; Higgins et 
al., 1996, Methods EnzymoL 266:383-402; Altschul et 
al., 1990, J. Mol. Biol. 2f5(3;:403-410; Altschul etal, 
1 993, Nature Genetics 3:266-272). 
[0257] I n a particularly preferred embodiment, protein 
and nucleic acid sequence homologies are evaluated 
using the Basic Local Alignment Search Tool ("BLAST") 
which is well known in the art (see, e.g., Kariin and Alt- 
schul, 1990, Proc. Natl. Acad. Sd. USA 875267-2268; 
Altschul etal., 1990, J. Mol. Biol. 275:403-410; Altschul 
et al., 1 993, Nature Genetics 2266-272; Altschul et al., 
1997, Nuc. Adds Res. 25:3389-3402). In particular, five 
specific BLAST programs are used to perform the fol- 
lowing task: 
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(4) TBLASTN compares a query protein sequence 
against a nucleotide sequence database, translat- 
ed in all six reading frames (both strands); and 

(5) TBLASTX compares the six-frame translations 
of a nucleotide query sequence against the six- 
frame translations of a nucleotide sequence data- 



(1) BLASTP and BLAST3 compare an amino acid 
query sequence against a protein sequence data- 



(2) BLASTN compares a nucleotide query se- 
quence against a nucleotide sequence database; 

(3) BLASTX compares the six-frame conceptual 
translation products of a query nucleotide sequence 
(both strands) against a protein sequence data- 



io p)258] The BLAST programs identify homologous se- 
quences by identifying similar segments, which are re- 
ferred to herein as "high-sconng segment pairs," be- 
tween a query amino or nucleic acid sequence and a 
test sequence which is preferably obtained from a pro- 
's tein or nucleic acid sequence database. High-scoring 
segment pairs are preferably identified (Le., aligned) by 
means of a scoring matrix, many of which are known in 
the art. Preferably, the scoring matrix used is the 
BLOSUM62 matrix (Gonnet et al., 1992, Sdence 256: 
20 1443-1445; Henikoff and Henikoff, 1993, Proteins 17: 
49-61). Less preferably, the PAM or PAM250 matrices 
may also be used (see, e.g., Schwartz and Dayhoff, 
eds., 1978, Matrices tor Detecting Distance Relation- 
ships: Atlas of Protein Sequence and Structure, Wash- 
es ington: National Biomedical Research Foundation) 
[0259] The BLAST programs evaluate the statistical 
significance of all high-scoring segment pairs identified, 
and preferably selects those segments which satisfy us- 
er-specified threshold of significance, such as a user- 
30 specified percent homology. Preferably, the statistical 
significance of a high-scoring segment pair is evaluated 
using the statistical significance formula of Kariin (see, 
e.g. t Kariin and Altschul, 1990, Proc. Natl. Acad. Sd. 
USA 87.2267-2268). 
35 [0260] The parameters used with the above algo- 
rithms may be adapted depending on the sequence 
length and degree of homology studied. In some em- 
bodiments, the parameters may be the default parame- 
ters used by the algorithms in the absence of instruc- 
40 tions from the user. 

[0261] In some embodiments, the level of homology 
between the hybridized nucleic acid and the extended 
cDNA, 5'EST, or 5' consensus contigated EST from 
which the probe was derived may be determined using 
45 the FASTDB algorithm described in Brutlag et al. Comp. 
App. Biosd. 6:237-245, 1990. In such analyses the pa- 
rameters may be selected as follows: Matrix=Unitary, k- 
tuple=4, Mismatch Penatty=1 , Joining Penalty=30, Ran- 
domization Group Length=0, Cutoff Score=1 , Gap Pen- 
50 alty=5, Gap Size Penalty=0.05, Window Size=500 orthe 
length of the sequence which hybridizes to the probe, 
whichever is shorter. Because the FASTDB program 
does not consider 5' or 3* truncations when calculating 
homology levels, if the sequence which hybridizes to the 
probe is truncated relative to the sequence of the ex- 
tended cDNA, 5'EST, or consensus contigated 5'EST 
from which the probe was derived the homology level is 
manually adjusted by calculating the number of nucle- 
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otldes of the extended cDNA, 5'EST, or consensus con- 
tigated 5* EST which are not matched or aligned with 
the hybridizing sequence, determining the percentage 
of total nucleotides of the hybridizing sequence which 
the non-matched or non-aligned nucleotides represent, s 
and subtracting this percentage from the homology lev- 
el. For example, if the hybridizing sequence is 700 nu- 
cleotides in length and the extended cDNA, 5'EST, or 
consensus contigated 5' EST sequence is 1000 nucle- 
otides in length wherein the first 300 bases at the 5* end 
of the extended cDNA, 5'EST, or consensus contigated 
5* EST are absent from the hybridizing sequence, and 
wherein the overlapping 700 nucleotides are identical, 
the homology level would be adjusted as follows. The 
non-matched, non-aligned 300 bases represent 30% of 
the length of the extended cDNA, 5'EST, or consensus 
contigated 5* EST If the overlapping 700 nucleotides are 
100%, identical, the adjusted homology level would be 
100-30=70% homology. It should be noted that the pre- 
ceding adjustments are only made when the non- 
matched or non-aligned nucleotides are at the 5' or 3' 
ends. No adjustments are made if the non-matched or 
non-aligned sequences are internal or under any other 
conditions. 

[0262] For example, using the above methods, nucle- 
ic acids having at least 95% nucleic acid homology, at 
least 96% nucleic acid homology, at least 97% nucleic 
acid homology, at least 98% nucleic acid homology, at 
least 99% nucleic acid homology, or more than 99% nu- 
cleic acid homology to the extended cDNA, 5'EST, or 
consensus contigated 5' EST from which the probe was 
derived may be obtained and identified. Such nucleic 
acids may be allelic variants or related nucleic acids 
from other species. Similarly, by using progressively 
less stringent hybridization conditions one can obtain 
and identify nucleic acids having at least 90%, at least 
85%, at least 80% or at least 75% homology, to the ex- 
tended cDNA, 5'EST, or consensus contigated 5' EST 
from which the probe was derived. 
[0263] Using the above methods and algorithms such 
as FASTA with parameters depending on the sequence 
length and degree of homology studied, for example the 
default parameters used by the algorithms in the ab- 
sence of instructions from the user, one can obtain nu- 
cleic acids encoding proteins having at least 99%, at 
least 98%, at least 97%, at least 96%, at least 95%, at 
least 90%, at least 85%, at least 80% or at least 75% 
homology to the protein encoded by the extended cD- 
NA, 5'EST, or consensus contigated 5' EST from which 
the probe was derived. In some embodiments, the ho- 
mology levels can be determined using the "default" 
opening penalty and the "default" gap penalty, and a 
scoring matrix such as PAM 250 (a standard scoring ma- 
trix; see Dayhoff et al., in: Atlas of Protein Sequence and 
Structure, Vol. 5, Supp. 3 (1978)). 
[0264] Alternatively, the level of polypeptide homolo- 
gy may be determined using the FASTDB algorithm de- 
scribed by Brutlag et al. Comp. App. Biosci. 6:237-245, 



1 990. 1 n such analyses the parameters may be selected 
as follows: Matnx=PAM 0, k-tuple=2, Mismatch Penal- 
ty=1, Joining Penatty=20, Randomization Group 
Length=0, Cutoff Score=1, Window Size=Sequence 
Length, Gap Penatty=5, Gap Size Penalty=0.Q5, Win- 
dow Size=500 or the length of the homologous se- 
quence, whichever is shorter. If the homologous amino 
acid sequence is shorter than the amino acid sequence 
encoded by the extended cDNA, 5'EST, or consensus 
contigated 5'EST as a result of an N terminal and/or C 
terminal deletion the results may be manually corrected 
as follows. First, the number of amino acid residues of 
the amino acid sequence encoded by the extended cD- 
NA, 5'EST, or consensus contigated 5' EST which are 
not matched or aligned with the homologous sequence 
is determined. Then, the percentage of the length of the 
sequence encoded by the extended cDNA, 5'EST, or 
consensus contigated 5 1 EST which the non-matched or 
non-aligned amino acids represent is calculated. This 
percentage is subtracted from the homology level. For 
example wherein the amino acid sequence encoded by 
the extended cDNA, 5'EST, or consensus contigated 5' 
EST is 100 amino acids in length and the length of the 
homologous sequence is 80 amino acids and wherein 
the amino acid sequence encoded by the extended cD- 
NA or 5'EST is truncated at the N terminal end with re- 
spect to the homologous sequence, the homology, level 
is calculated as follows. In the preceding scenario there 
are 20 non-matched, non-aligned amino acids in the se- 
quence encoded by the extended cDNA, 5'EST, or con- 
sensus contigated 5'EST. This represents 20% of the 
length of the amino acid sequence encoded by the ex- 
tended cDNA, 5'EST, or consensus contigated 5' EST. 
If the remaining amino acids are 1005 identical between 
the two sequences, the homology level would be 1 00%- 
20%=80% homology. No adjustments are made if the 
non-matched or non-aligned sequences are internal or 
under any other conditions. 

[0265] In addition to the above descrfred methods, 
other protocols are available to obtain extended cDNAs 
using 5* ESTs or consensus contigated 5'ESTs as out- 
lined in the following paragraphs. 
[0266] Extended cDNAs may be prepared by obtain- 
ing mRNA from the tissue, cell, or organism of interest 
using mRNA preparation procedures utilizing poiyA se- 
lection procedures or other techniques known to those 
skilled in the art. A first primer capable of hybridizing to 
the polyA tail of the mRNA is hybndized to the mRNA 
and a reverse transcription reaction is performed to gen- 
erate a first cDNA strand. 

[0267] The first cDNA strand is hybridized to a second 
primer containing at least 1 0 consecutive nucleotides of 
the sequences of SEQ ID NOs 24-3883 and 
7744-1 9335. Preferably, the primer comprises at least 
10, 12, 15, 17, 18, 20, 23, 25, or 28 consecutive nucle- 
otides from the sequences of SEQ ID NOs 24-3883 and 
7744-19335. In some embodiments, the primer com- 
prises more than 30 nucleotides from the sequences of 
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SEQ ID NOs 24-3883 and 7744-19335. If It Is desired 
to obtain extended cDNAs containing the full protein 
coding sequence, including the authentic translation in- 
itiation site, the second primer used contains sequences 
located upstream of the translation initiation site. The 
second primer is extended to generate a second cDNA 
strand complementary to the first cDN A strand Alterna- 
tively, RT-PCR may be performed as described above 
using primers from both ends of the cDNA to be ob- 
tained. 

[0268] Extended cDNAs containing 5' fragments of 
the mRNA may be prepared by hybridizing an mRNA 
comprising the sequences of SEQ ID NOs. 24-3883 and 
7744-1 9335 with a primer comprising a complementary 
to a fragment of an EST-related nucleic acid hybridizing 
the primer to the mRNAs, and reverse transcribing the 
hybridized primer to make a first cDNA strand from the 
mRNAs. Preferably, the primer comprises at least 10, 
12, 15, 17, 18, 20, 23, 25, or 28 consecutive nucleotides 
of the sequences complementary to SEQ ID NOs. 
24-3883 and 7744-19335. 

[0269] Thereafter, a second cDNA strand comple- 
mentary to the first cDNA strand is synthesized. The 
second cDNA strand may be made by hybridizing a 
primer complementary to sequences in the first cDNA 
strand to the first cDNA strand and extending the primer 
to generate the second cDNA strand. 
[0270] The double stranded extended cDNAs made 
using the methods described above are isolated and 
cloned. The extended cDNAs may be cloned into vec- 
tors such as plasmids or viral vectors capable of repli- 
cating in an appropriate host cell. For example, the host 
cell may be abacterial, mammalian, avian, or insect cell. 
[0271] Techniques for isolating mRNA, reverse tran- 
scribing a primer hybridized to mRNA to generate a first 
cDNA strand, extending a primer to make a second cD- 
NA strand complementary to the first cDNA strand, iso- 
lating the double stranded cDN A and cloning the double 
stranded cDN A are well known to those skilled in the art 
and are described in Current Protocols in Molecular Bi- 
ology, John Wiley & Sons, Inc. 1997 and Sambrook et 
al., Molecular Cloning A Laboratory Manual, Second 
Edition, Cold Spring Harbor Laboratory Press, 1 989, the 
entire disclosures of which are incorporated herein by 
reference. 

[0272] Alternatively, other procedures may be used 
for obtaining full-length cDNAs or extended cDNAs. In 
one approach, full-length or extended cDNAs are pre- 
pared from mRNA and cloned into double stranded 
phagemids as follows. The cDNA library in the double 
stranded phagemids is then rendered single stranded 
by treatment with an endonuclease, such as the Gene 
II product of the phage F1 and an exonuclease (Chang 
et al. Gene 127:95-8, 1993). A biotinylated oligonucle- 
otide comprising the sequence of a fragment of an EST- 
related nucleic acid is hybridized to the single stranded 
phagemids. Preferably, the fragment comprises at least 
10, 12, 15, 17, 18, 20, 23, 25, or 28 consecutive nucle- 



otides ofthe sequences of SEQ ID NOs. 24-3883 and 
7744-19335. 

[0273] Hybrids between the biotinylated oligonucle- 
otide and phagemids are isolated by incubating the hy- 

* brids with streptavidin coated paramagnetic beads and 
retrieving the beads with a magnet (Fry et al., Biotech- 
niques, 13. 124-131, 1992). Thereafter, the resulting 
phagemids are released from the beads and converted 
into double stranded DNA using a primer specific for the 

10 5' EST or consensus contigated 5'EST sequence used 
to design the biotinylated oligonucleotide. Alternatively, 
protocols such as the Gene Trapper kit (Gibco BRL) may 
be used. The resulting double stranded DNA is trans- 
formed into bacteria. Extended cDNAs or full length cD- 

» NAs containing the 5* EST or consensus contigated 
5'EST sequence are identified by colony PGR or colony 
hybridization. 
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EXAMPLE 19 

Extended cDNAs and Full Length cDNAs 



[0274] The procedure described in Example 17 was 
used to obtain extended cDNAs orfull length cDNAs de- 
25 nved from 5' ESTs in a variety of tissues. The following 
list provides a few examples of thus obtained cDNAs. 
[0275] Using this procedure, the full length cDNA of 
SEQ ID NO.1 (internal identification number 
58-34-2-E7-FL2) was obtained. This cDNA encodes the 
30 signal peptide NWWFQQGLSFLPSALVIWTSA (SEQ 
ID NO.2) having a von Heijne score of 5.5. 
[0276] Using this approach, the full length cDNA of 
SEQ ID NO.3 (internal identification number 
48-1 9-3-G 1 -FL1 ) was obtained. This cDNA encodes the 
35 signal peptide MKKVLLLITAILAVAVG (SEQ ID NO. 4) 
having a von Heijne score of 8.2. 
[0277] The full length cDNA of SEQ ID NO.5 (internal 
identification number 58-35-2-F1 0-FL2) was also ob- 
tained using this procedure. This cDNA encodes a sig- 
40 nal peptide LWLLFFLVTAIHA (SEQ ID NO.6) having a 
von Heijne score of 1 0.7. 

[0278] Furthermore, the polypeptides encoded by the 
extended or full-length cDN As may be screened for the 
presence of known structural or functional motifs or for 
4$ the presence of signatures, small amino acid sequences 
which arc well conserved amongst the members of a 
protein family. The results obtained for the polypeptides 
encoded by a few f ul l-length cDN As derived from 5' ESTs 
that were screened for the presence of known protein 
so signatures and motifs using the Proscan software from 
the GCG package and the Prosite 15.0 database are 
provided below. 

[0279] The protein of SEQ ID NO. 6 encoded by the 
full-length cDNA SEQ ID NO. 7 (internal designation 
55 78-8-3-E6-CL0_1 C) and expressed in adult prostate 
belong to the phosphatidylethanolamine-binding protein 
from which it exhibits the characteristic PROSITE sig- 
nature. Proteins from this widespread family, from nem- 
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atodes to fly. yeast rodent and primate species, bind 
hydrophobic ligands such as phospholipids and nucle- 
otides. They arc mostly expressed in brain and in testis 
and are thought to play a role in ceil growth and/or mat- 
uration, in regulation of the sperm maturation, motility 
and in membrane remodeling. They may act either 
through signal transduction or through oxidoreduction 
reactions (for a review see Schoentgen and Jolles, 
FEBS Letters, 369:22-26 (1995)). Taken together, these 
data suggest that the protein of SEQ ID NO. 8 may play 
a role in cell growth, maturation and in membrane re- 
modeling and/or may be related to male fertility. Thus, 
these protein may be useful in diagnosing and/or treat- 
ing cancer, neurodegenerative diseases, and/or disor- 
ders related to male fertility and sterility. 
[0280] The protein of SEQ ID NOs. 1 0 encoded by the 
extended cDNA SEQ ID NO. 9 (internal designation 
59-9-2-E6-FL0_tC) belong to the stomatin or band 7 
family. The human stomatin is an integral membrane 
phosphoprotein thought to regulate the cation conduct- 
ance by interacting with other proteins of the junctional 
complex of the membrane skeleton (Gallagher and For- 
get, J. Biol. Chew., 270:26358-26363 (1995)). The pro- 
tein of SEQ ID NO. 10 exhibits the PROSITE signature 
typical for the band 7 family signature. Taken together, 
these data suggest that the protein of SEQ ID NO. 10 
plays a role in the regulation of ion transport, hence in 
the control of cellular volume. This protein may find ap- 
plications in diagnosing and/or treating stomatocytosis 
and/or cryohydrocytosis. 

[0281] The protein of SEQ ID NO. 12 encoded by the 
extended cDNA SEQ ID NO. 11 (internal designation 
19-10-1-C2-CL13) shows homology with the bovine 
subunit B14.5B of the NADH-ubiquinone oxidureduct- 
ase complex (Arizmendi et at. FEBS Lett, 313: 80-84 
(1 992) and Swissprot accession number Q 02827). This 
complex is the first of four complexes located in the inner 
mitochondrial membrane which make up the mitochon- 
drial electron transport chain Complex I is involved in 
the dehydrogenation of NADH and the transportation of 
electrons to coenzyme Q. It is composed of 7 subunits 
encoded by the mitochondrial genome and 34 subunits 
encoded by the nuclear genome. It is also thought to 
play a role in the regulation of apoptosis and necrosis. 
Mitochondriocytopathies due to complex I deficiency 
are frequently encountered and affect tissues with a 
high energy demand such as brain (mental retardation, 
convulsions, movement disorders), heart (cardiomyop- 
athy, conduction disorders), kidney (Fanconi syn- 
drome), skeletal muscle (exercise intolerance, muscle 
weakness, hypotonia) and/or eye (opthmaloplegia, pto- 
sis, cataract and retinopathy). For a review on complex 
I see Smeitink et al. t Hum. Mot. Gent, 7: 1573-1579 
(1 998). Taken together, these data suggest that the pro- 
tein of SEQ ID NO. 12 may be part of the mitochondrial 
energy-generating system, probably as a subunit of the 
NADH-ubiquinone oxidoreductase complex. Thus, this 
protein or part therein, may find applications in diagnos- 



ing and/or treating several disorders including, but not 
limited to, brain disorders (mental retardation, convul- 
sions, movement disorders), heart disorders (cardiomy- 
opathy, conduction disorders), kidney disorders (Fanco- 
5 ni syndrome), skeletal muscle disorders (exercise intol- 
erance, muscle weakness, hypotonia) and/or eye disor- 
ders opthmatmoplegia, ptosis, cataract and retinopa- 
thy). 

[0282] The protein of SEQ ID NO.14 encoded by the 

io extended cDNA SEQ ID NO. 13 (internal designation 
77-1 3-1 -C11-FL2_2C) exhibits an extensive homology 
with a murine protein named MP! for MEK binding part- 
ner 1 (Genbank accession number AF082526). MPI 
was shown to enhance enzymatic activation of mitogen- 
's activated protein (MAP) kinase cascade. The MAP ki- 
nase pathway is one of the important enzymatic cas- 
cade that is conserved among all eukaryotes from yeast 
to human. This kind of pathway is involved in vital func- 
tions such as the regulation of growth, differentiation 

20 and apoptosis. MP1 probably acts by facilitating the in- 
teraction of the two sequentially acting kinases MEK1 
and ERK1 (Schaffer et at., Science, 281:1668-1671 
(1 998)). Taken together, these data suggest that the pro- 
tein of SEQ ID NO. 14 may be involved in regulating 

25 protein-protein interaction in the signal transduction 
pathways. Thus, this protein may be useful in diagnos- 
ing and/or treating several types of disorders including, 
but not limited to, cancer, neurodegenerative diseases, 
cardiovascular disorders, hypertension, renal injury and 

30 repair and septic shock. 

[0283] Bacterial clones containing plasmids contain- 
ing the full length cDNAs described above are presently 
stored in the inventor's laboratories under the internal 
identification numbers provided above. The inserts may 

35 be recovered from the deposited materials by grooving 
an aliquot of the appropriate bacterial clone in the ap- 
propriate medium. The plasmid DNA can then be isolat- 
ed using plasmid isolation procedures familiar to those 
skilled in the art such as alkaline lysis minipreps or large 

40 scale alkaline lysis plasmid isolation procedures. If de- 
sired the plasmid DNA may be further enriched by cen- 
trifugation on a cesium chloride gradient, size exclusion 
chromatography, or anion exchange chromatography. 
The plasmid DNA obtained using these procedures may 

45 then be manipulated using standard cloning techniques 
familiar to those skilled in the art. Alternatively, a PCR 
can be done with primers designed at both ends ofthe 
EST insertion. The PCR product which corresponds to 
the 5'EST can then be manipulated using standard clon- 

50 ing techniques familiar to those skilled in the art. 

[0284] Using any of the above described methods in 
section IV, a plurality of extended cDNAs containing full- 
length protein coding sequences or portions of the pro- 
tein coding sequences may be provided as cDNA librar- 

55 ies for subsequent evaluation of the encoded proteins 
or use in diagnostic assays as described below. 
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V. Expression of Proteins Encoded by EST-related 
nucleic acids " 



[0285] EST-related nucleic acids, fragments of EST- 
related nucleic acids, positional segments of EST-relat- 
ed nucleic acids, and fragments of positional segments 
of EST-related nucleic acids may be used to express the 
polypeptides which they encode. In particular, they may 
be used to express EST-related polypeptides, frag- 
ments of EST-related polypeptides, positional segments 
of EST-related polypeptides, or fragments of positional 
segments of EST-related polypeptides. In some embod- 
iments, the EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids, and fragments of 
positional segments of EST-related nucleic acids may 
be used to express the full polypeptide (i.e. the signal 
peptide and the mature polypeptide) of a secreted pro- 
tein, the mature protein (i.e. the polypeptide generated 
after cleavage of the signal peptide), or the signal pep- 
tide of a secreted protein. If desired, nucleic acids en- 
coding the signal peptide may be used to facilitate se- 
cretion of the expressed protein It will be appreciated 
that a plurality of EST-related nucleic acids, fragments 
of EST-related nucleic acids, positional segments of 
EST-related nucleic acids, or fragments of positional 
segments of EST-related nucleic acids may be simulta- 
neously cloned into expression vectors to create an ex- 
pression library for analysis of the encoded proteins as 
described below. 

EXAMPLE 20 

Expression of the Proteins Encoded by the Genes 
Corresponding to the 5'ESTs or Consensus Contigated 
5'ESTs 

[0286] To express their encoded proteins the EST-re- 
lated nucleic acids, fragments of EST-related nucleic ac- 
ids, positional segments of EST-related nucleic acids, 
or fragments of positional segments of EST-related nu- 
cleic acids are cloned into a suitable expression vector. 
In some instances, nucleic acids encoding EST-related 
polypeptides, fragments of EST-related polypeptides, 
positional segments of EST-related polypeptides or 
fragments of positional segments of EST-related 
polypeptides may be cloned into a suitable expression 
vector. 

[0287] In some embodiments, the nucleic acids in- 
serted into the expression vector may comprise the cod- 
ing sequence of a sequence selected from the group 
consisting of 24-3883. In other embodiments, the nucle- 
ic acids inserted into the expression vector may com- 
prise may comprise the full coding sequence (i.e. the 
nucleotides encoding the signal peptide and the mature 
polypeptide) of one of SEQ ID NOs. 1339-2059. In some 
embodiments, the nucleic acid inserted into the expres- 
sion vector may compnse the nucleotides of one of the 
sequences of SEQ ID NOs. 1339-2059 which encode 



the mature polypeptide (i.e. the nucleotides encoding 
the polypeptide generated after cleavage of the signal 
peptide). In further embodiments, the nucleic acids in- 
serted into the expression vector may comprise the nu- 
5 cleotides of SEQ ID NOs. 24-383 and 1 339-2059 which 
encode the signal peptide to facilitate secretion of the 
expressed protein. The nucleic acids inserted into the 
expression vectors may also contain sequences up- 
stream of the sequences encoding the signal peptide, 
io such as sequences which regulate expression levels or 
sequences which confer tissue specific expression. 
[0288] The nucleic acid inserted into the expression 
vector may encode a polypeptide comprising the one of 
the sequences of SEQ ID NOs. 3884-7743. In some em- 
is bodiments, the nucleic acid inserted into the expression 
vector may encode the full polypeptide sequence (i.e. 
the signal peptide and the mature polypeptide) included 
in one of SEQ ID NOs. 5199-5919. In other embodi- 
ments, the nucleic acid inserted into the expression vec- 
20 tor may encode the mature polypeptide (i.e. the 
polypeptide generated after cleavage of the signal pep- 
tide) included in one of the sequences of SEQ ID NOs. 
5199-5919. In further embodiments, the nucleic acids 
inserted into the expression vector may encode the sig- 
25 nal peptide included in one of the sequences of 
3884-4243 and 51 99-591 9. 

[0289] The nucleic acid encoding the protein or 
polypeptide to be expressed is operably linked to a pro- 
moter in an expression vector using conventional clon- 
30 ing technology. The expression vector may be any of 
the mammalian, yeast, insect or bacterial expression 
systems known in the art. Commercially available vec- 
tors and expression systems are available from a variety 
of suppliers including Genetics Institute (Cambridge, 
35 MA), Stratagene (La Jolla, California), Promega (Madi- 
son, Wisconsin), and Invitrogen (San Diego, California). 
If desired, to enhance expression and facilitate proper 
protein folding, the codon context and codon pairing of 
the sequence may be optimized for the particular ex- 
40 pression organism in which the expression vector is in- 
troduced, as explained by Hatfield, et al. t U.S. Patent 
No. 5,082,767, incorporated herein by this reference. 
[0290] The following is provided as one exemplary 
method to express the proteins encoded by the nucleic 
45 fl cids described above. In some instances the nucleic 
acid encoding the protein or polypeptide to be ex- 
pressed includes a methionine initiation codon and a 
polyA signal. If the nucleic acid encoding the polypep- 
tide to be expressed lacks a methionine to serve as the 
50 initiation site, an initiating methionine can be introduced 
next to the first codon of the nucleic acid using conven- 
tional techniques. Similarly, if the nucleic acid encoding 
the protein or polypeptide to be expressed lacks a polyA 
signal, this sequence can be added to the construct by, 
55 for example, splicing out the polyA signal from pSG5 
(Stratagene) using Bgll and Sail restriction endonucle- 
ase enzymes and incorporating it into the mammalian 
expression vector pXT1 (Stratagene). pXT1 contains 
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the LTRs and a portion of the gag gene from Moloney 
Murine Leukemia Virus. Hie position of the LTRs in the 
construct allow efficient stable transfection. The vector 
includes the Herpes Simplex thymidine kinase promoter 
and the selectable neomycin gene. The nucleic acid en- s 
coding the polypeptide to be expressed is obtained by 
PCR from the bacterial vector using oligonucleotide 
primers complementary to the nucleic acid encoding the 
protein or polypeptide to be expressed and containing 
restriction endonuciease sequences for Pst I incorpo- w 
rated into the 5*prim er and Bglil at the 5' end of 3' primer, 
taking care to ensure that the nucleic acid encoding the 
protein or polypeptide to be expressed is correctly po- 
sitioned with respect to the poly A signal. The purified 
fragment obtained from the resulting PCR reaction is di- « 
gested with Pstl, blunt ended with an exonuclease t di- 
gested with Bgl II, purified and ligated to pXT1 , now con- 
taining a poly A signal and digested with Bglll. 
[0291] The ligated product is transfected into mouse 
NIH 3T3 cells using Lipofectin (Life Technologies, Inc., 20 
Grand Island, New York) under conditions outlined in the 
product specification. Positive transfectants are select- 
ed after growing the transfected cells in 600 u.g/ml G41 8 
(Sigma, St. Louis, Missouri). 

[0292] Alternatively, the nucleic acid encoding the 2s 
protein or polypeptide to be expressed may be cloned 
into pED6dpc2 as described above. The resulting 
pED6dpc2 constructs may be transfected into a suitable 
host cell, such as COS 1 cells. Methotrexate resistant 
cells are selected and expanded. The expressed protein 30 
or polypeptide may be isolated, purified, or enriched as 
described above. 

[0293] To confirm expression of the desired protein or 
polypeptide, the proteins or polypeptides produced by 
cells containing a vector with a nucleic acid insert en- 35 
coding the protein or polypeptide are compared to those 
lacking such an insert. The expressed proteins are de- 
tected using techniques familiar to those skilled in the 
art such as Coomassie blue or silver staining or using 
antibodies against the protein or polypeptide encoded 
by the nucleic acid insert. Antibodies capable of specif- 
ically recognizing the protein of interest may be gener- 
ated using synthetic 15-mer peptides having a se- 
quence encoded by the appropriate nucleic acid. The 
synthetic peptides are injected into mice to generate an- 45 
tibody to the polypeptide encoded by the nucleic acid. 
[0294] If the proteins or polypeptides encoded by the 
nucleic acid inserts are secreted, medium prepared 
from the host cells or organisms containing an expres- 
sion vector which contains a nucleic acid insert encod- so 
ing the desired protein or polypeptide is compared to 
mdieum prepared from the control cells or organism. 
The presence of a band in medium from the cells con- 
taining the nucleic acid insert which is absent from prep- 
arations from the control cells indicates that the protein 55 
or polypeptide encoded by the nucleic acid insert is be- 
ing expressed and secreted. Generally, the band corre- 
sponding to the protein encoded by the nucleic acid in- 



sert will have a mobility near that expected based on the 
number of amino acids in the open reading frame of the 
nucleic acid insert However, the band may have a mo- 
bility different than that expected as a result of modifi- 
cations such as grycosytation, ubiqurtination, or enzy- 
matic cleavage. 

(0295] Alternatively, if the protein expressed from the 
above expression vectors does not contain sequences 
directing its secretion, the proteins expressed from host 
cells containing an expression vector with an insert en- 
coding a secreted protein or portion thereof can be com- 
pared to the proteins expressed in control host cells con- 
taining the expression vector without an insert The 
presence of a band in samples from cells containing the 
expression vector with an insert which is absent in sam- 
ples from cells containing the expression vector without 
an insert indicates that the desired protein or portion 
thereof is being expressed. Generally, the band will 
have the mobility expected for the secreted protein or 
portion thereof. However, the band may have a mobility 
different than that expected as a result of modifications 
such as gtycosylation, ubiqurtination, or enzymatic 
cleavage. 

[0296] The expressed protein or polypeptide may be 
purified, isolated or enriched using a variety of methods. 
In some methods, the protein or polypeptide may be se- 
creted into the culture medium via a native signal pep- 
tide or a heterologous signal peptide operably linked 
thereto. In some methods, the protein or polypeptide 
may be linked to a heterologous polypeptide which fa- 
cilitates its isolation, purification, or enrichment such as 
a nickel binding polypeptide. The protein or polypeptide 
may also be obtained by gel electrophoresis, ion ex- 
change chromatography, size chromatography, hplc, 
salt precipitation, immunoprecipitation, a combination of 
any of the preceding methods, or any of the isolation, 
purification, or enrichment techniques familiar to those 
skilled in the art. 

[0297] The protein encoded by the nucleic acid insert 
may also be purified using standard immunochromatog- 
raphy techniques using immunoaffinity chromatography 
with antibodies directed against the encoded protein or 
polypeptide as described in more detail below. If anti- 
body production is not possible, the nucleic acid insert 
encoding the desired protein or polypeptide may be in- 
corporated into expression vectors designed for use in 
purification schemes employing chimeric polypeptides. 
In such strategies, the coding sequence of the nucleic 
acid insert is ligated in frame with the gene encoding the 
other half of the chimera. The other half of the chimera 
may be p-globin or a nickel binding polypeptide. A chro- 
matography matrix having antibody to p-globin or nickel 
attached thereto is then used to purify the chimeric pro- 
tein. Protease cleavage sites may be engineered be- 
tween the -gtobin gene or the nickel binding polypeptide 
and the extended cDNA or portion thereof. Thus, the two 
polypeptides of the chimera may be separated from one 
another by protease digestion. 
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[0298] One useful expression vector for generating 0- 
gtobin chimerics is pSG5 (Stratagene), which encodes 
rabbit p-globin. Intron II of the rabbit 0-gIobin gene facil- 
itates splicing of the expressed transcript, and the poly* 
adenylation signal incorporated into the construct in- * 
creases the level of expression. These techniques as 
described are well known to those skilled in the art of 
molecular biology. Standard methods are published in 
methods texts such as Davis et at., (Basic Methods in 
Molecular Biology, LG. Davis, M.D. Dibner, and J.R n 
Battey, ed., Elsevier Press, NY, 1986) and many of the 
methods are available from Stratagene, Life Technolo- 
gies, Inc., or Promega. Polypeptide may additionally be 
produced from the construct using in vitro translation 
systems such as the In vitro Express™ Translation Kit n 
(Stratagene). 

[0299] Following expression and purification of the 
proteins or polypeptides encoded by the nucleic acid in- 
serts, the purified proteins may be tested for the ability 
to bind to the surface of various cell types as described 20 
in Example 21 below. It will be appreciated that a plural- 
ity of proteins expressed from these nucleic acid inserts 
may be included in a panel of proteins to be simultane- 
ously evaluated for the activities specifically described 
below, as well as other biological roles for which assays 25 
for determining activity are available. 

EXAMPLE 21 

Analysis of Secreted Proteins or Polypeptides to so 
Determine Whether they Bind to the Cell Surface 

[0300J The EST-related nucleic acids, fragments of 
EST-related nucleic acids, positional segments of EST- 
related nucleic acids, fragments of positional segments 35 
of EST-related nucleic acids, nucleic adds encoding the 
EST-related polypeptides, nucleic acids encoding frag- 
ments of the EST-related polypeptides, nucleic acids 
encoding positional segments of EST-related polypep- 
tides, or nucleic acids encoding fragments of positional 40 
segments of EST-related polypeptides are cloned into 
expression vectors such as those described in Example 
20. The encoded proteins or polypeptides are purified, 
isolated, or enriched as described above. Following pu- 
rification, isolation, or enrichment, the proteins or 45 
polypeptides are labeled using techniques known to 
those skilled in the art. The labeled proteins or polypep- 
tides are incubated with cells or cell lines derived from 
a variety of organs or tissues to allow the proteins to 
bind to any receptor present on the cell surface. Follow- so 
ing the incubation, the cells are washed to remove non- 
specifically bound proteins or polypeptides. The specif- 
ically bound labeled proteins or polypeptides are detect- 
ed by autoradiography. Alternatively, unlabeled proteins 
or polypeptides may be incubated with the cells and de- 55 
tected with antibodies having a detectable label, such 
as a fluorescent molecule, attached thereto. 
[0301 J Specificity of cell surface binding may be ana- 



lyzed by conducting a competition analysis in which var- 
ious amounts of unlabeled protein or polypeptide are in- 
cubated along with the labeled protein or polypeptide. 
The amount of labeled protein or polypeptide bound to 
the cell surface decreases as the amount of competitive 
unlabeled protein or polypeptide increases. As a control, 
various amounts of an unlabeled protein or polypeptide 
unrelated to the labeled protein or polypeptide is includ- 
ed in some binding reactions. The amount of labeled 
' protein or polypeptide bound to the cell surface does not 
decrease in binding reactions containing increasing 
amounts of unrelated unlabeled protein, indicating that 
the protein or polypeptide encoded by the nucleic acid 
binds specifically to the ceil surface. 
[0302] As discussed above, human proteins have 
been shown to have a number of important physiological 
effects and, consequently, represent a valuable thera- 
peutic resource. The human proteins or polypeptides 
made as described above may be evaluated to deter- 
mine their physiological activities as described below. 

EXAMPLE 22 

Assaying the Expressed Proteins or Polypeptides for 
Cytokine, Cell Proliferation or Cell Differentiation 
Activity 

[0303] As discussed above, some human proteins act 
as cytokines or may affect cellular proliferation or differ- 
entiation. Many protein factors discovered to date, in- 
cluding ail known cytokines, have exhibited activity in 
one or more factor dependent cell proliferation assays, 
and hence the assays serve as a convenient confirma- 
tion of cytokine activity. The activity of a protein or 
polypeptide of the present invention is evidenced by any 
one of a number of routine factor dependent cell prolif- 
eration assays for cell lines including without limitation, 
32D, DA2, DA1G, T10, B9, B9/11, BaF3, MC9/G, M + 
(preB M+), 2E8, RB5, DA1, 123, T1165, HT2, CTLL2, 
TF-1 , Mo 7c and CMK. The proteins or polypeptides pre- 
pared as described above may be evaluated for their 
ability to regulate T cell or thymocyte proliferation in as- 
says such as those described above or in the following 
references, which are incorporated herein by reference: 
Current Protocols in Immunology, Ed. by J.E. Coligan et 
al., Greene Publishing Associates and Wiley-lnter- 
science; Takai et al. J. Immunol. 137:3494-3500. 1986., 
Bertagnolli et al. J. Immunol. 145:1706-1712, 1990., 
Bertagnolli et al., Cellular Immunology 133:327-341, 
1991. Bertagnolli, et al. J. Immunol. 149:3778-3783, 
1 992; Bowman etal., J. Immunol. 1 52:1 756-1 761 , 1 994. 
[0304] In addition, numerous assays for cytokine pro- 
duction and/or the proliferation of spleen cells, lymph 
node cells and thymocytes are known. These include 
the techniques disclosed in Current Protocols in Im- 
munology. J.E. Coligan etal. Eds., 1:3.12.1-3.12, 14, 
John Wiley and Sons, Toronto. 1 994; and Schreiber, R. 
D. In Current Protocols in Immunology, supra 1 : 
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6.8.1-6.8.8. 

[0305] The proteins or polypeptides prepared as de- 
scribed above may also be assayed for the ability to reg- 
ulate the proliferation and differentiation of hematopoi- 
etic or lymphopoietic cells. Many assays for such activ- s 
ity, are familiar to those skilled in the art, including the 
assays in the following references, which are incorpo- 
rated herein by reference: Bottomry ef al., in Current 
Protocols in Immunology., supra. 1: 6.3.1-6.3.12,; 
deVries et al., J. Exp. Med. 173:1205-1211, 1991; to 
Moreau et al., Nature 36:690-692, 1988; Greenberger 
etal., Proc. Natl. Acad. Set. U.S.A. 80:2931-2938, 1983; 
Nordan, R., In Current Protocols in Immunology, supra. 
1:6.6.1-6.6.5, Smith et al., Proc. Natl. Acad. Set. U.S.A. 
83:1857-1861 , 1986; Bennett ef a/in Current Protocols ts 
in Immunology supra 1 : 6.1 5.1; Ciartettaef a/In Current 
Protocols in Immunology, supra 1 : 6.13.1 . 
[0306] The proteins or polypeptides prepared as de- 
scribed above may also be assayed for their ability to 
regulate T-ceil responses to antigens. Many assays for 20 
such activity are familiar to those skilled In the art, in- 
cluding the assays described in the following referenc- 
es, which are incorporated herein by reference: Chapter 
3 (In vitro Assays for Mouse Lymphocyte Function), 
Chapter 6 (Cytokines and Their Cellular Receptors) and 25 
Chapter 7, (Immunologic Studies in Humans) in Current 
Protocols in Immunology supra; Weinberger etal., Proc. 
Natl. Acad. Set. USA 77:6091 -6095, 1980; Weinberger 
et al., Eur. J. Immun. 1 1 :405^41 1 , 1 981 ; Takai ef al., J. 
Immunol. 1 37:3494-3500, 1 986; Takai etal., J. Immunol. 30 
140:508-512, 1988. 

Those proteins or polypeptides which exhibit cytokine, 
cell proliferation, or cell differentiation activity may then 
be formulated as pharmaceuticals and used to treat 35 
clinical conditions in which induction of cell proliferation 
or differentiation is beneficial. Alternatively, as 
described in more detail below, nucleic acids encoding 
these proteins or polypeptides or nucleic acids 
regulating the expression of these proteins or 40 
polypeptides may be introduced into appropriate host 
cells to increase or decrease the expression of the 
proteins or polypeptides as desired 

EXAMPLE 23 45 

Assaying the Expressed Proteins or Polypeptides for 
Activity as Immune System Regulators 

[0307] The proteins or polypeptides prepared as de- 50 
scribed above may also be evaluated for their effects as 
immune regulators. For example, the proteins or 
polypeptides may be evaluated for their activity to influ- 
ence thymocyte or splenocyte cytotoxicity. Numerous 
assays for such activity are familiar to those skilled in 55 
the art including the assays described in the following 
references, which are incorporated herein by reference: 
Chapter 3 (In vitro Assays tor Mouse Lymphocyte Func- 



tion 3.1-3.19) and Chapter 7 (Immunologic studies in 
Humans) in Current Protocols in Immunology, J.E. Col- 
igan et al. Eds, Greene Publishing Associates and Wi- 
ley-lnterscience; Herrmann etal., Proc. Natl. Acad. Set. 
USA 782488-2492, 1 981 ; Herrmann etal., J. Immunol. 
128:1968-1974, 1982: Han da etal., J. Immunol. 135: 
1564-1572, 1985; Takai et al., J. Immunol. 137: 
3494-3500, 1 986; Takai ef al., J. Immunol. 140:508-51 2, 
1988; Bowman etal., J. Virology 61:1992-1998; Ber- 
tagnolli ef al. CeQ. Immunol. 133:327-341 , 1 991 ; Brown 
etal., J. Immunol 153:3079-3092, 1994. 
[0308] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their effects on 
T-cell dependent immunoglobulin responses and tso- 
type switching. Numerous assays for such activity are 
familiar to those skilled in the art, including the assays 
disclosed in the following references, which are incor- 
porated herein by reference: Maliszewski, J. Immunol 
144:3028-3033, 1990; Mond etal. in Current Protocols 
in Immunology, 1 : 3.8.1-3 8.16, supra. 
[0309] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their effect on 
immune effector cells, including their effect on Thl cells 
and cytotoxic lymphocytes. Numerous assays for such 
activity are familiar to those skilled in the art, Including 
the assays disclosed in the following references, which 
are incorporated herein by reference: Chapter 3 (In vitro 
Assays for Mouse Lymphocyte Function 3.1-3.19) and 
Chapter 7 (Immunologic Studies in Humans) in Current 
Protocols in Immunology, supra; Takai etal., J. Immunol. 
137:3494-3500, 1986; Takai et al.; J. Immunol. 140: 
508-512, 1988; Bertagnolli ef al., J. Immunol. 149: 
3778-3783, 1992. 

[0310] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their effect on 
dendritic ceil mediated activation of naive T-cells. Nu- 
merous assays for such activity are familiar to those 
skilled in the art, including the assays disclosed in the 
following references, which are incorporated herein by 
reference: Query etal., J. Immunol. 134:536-544, 1 995; 
Inaba etal., J. Exp. Med. 173:549-559, 1991 ; Macatonia 
ef al., J. Immunol. 154:5071 -5079, 1 995; Porgador ef al 
J Exp. MeoM82:255-260, 1995; Nair ef al., J. Virol. 67: 
4062-4069, 1993; Huang etal., Science 264:961 -965, 
1994; Macatonia et al. J. Exp. Med 169:1255-1264, 
1 989; Bhardwaj et al., Journal of Clinical Investigation 
94:797-807, 1994; and Inaba etal., J. Exp. Med 172: 
631-640, 1990. 

[0311] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their influence 
on the lifetime of lymphocytes. Numerous assays for 
such activity, are familiar to those skilled in the art, in- 
cluding the assays disclosed in the following references, 
which are incorporated herein by reference: Darzynkie- 
wicz et al., Cytometry 13:795-808, 1992; Gorczyca et 
al., Leukemia 7:659-670, 1993; Gorczyca ef al., Cancel 
Res. 53.1945-1951, 1993; Itoh etal., Cell 66:233-243, 
1991; Zacharchuk, J. Immunol. 145:4037-4045, 1990; 
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Zamai etal., Cytometry 14:891 -897, 1993; Gorczyca et 
al., Int. J Oncol. 1 £39-648, 1 992. 
[0312] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their influence 
on early steps of T-cell commitment and development. 
Numerous assays for such activity are familiar to those 
skilled in the art, including without limitation the assays 
disclosed in the following references, which are incor- 
porated herein by references: Anttea et al., Blood 84: 
111-117, 1994; Fine et al., Cell. Immunol. 155:111-122, 
1 994; Gary etal., Blood 85:2770-2778, 1 995; Toki et at.] 
Proc Nat. Acad Set. USA 88:7548-7551 , 1 991 . 
[0313] Those proteins or polypeptides which exhibit 
activity as immune system regulators activity may then 
be formulated as pharmaceuticals and used to treat clin- 
ical conditions in which regulation of immune activity is 
beneficial. For example, the protein or polypeptide may 
be useful in the treatment of various immune deficien- 
cies and disorders (including severe combined immun- 
odeficiency), e.g., in regulating (up or down) growth and 
proliferation of T and/or B lymphocytes, as well as ef- 
fecting the cytolytic activity of NK cells and other cell 
populations. These immune deficiencies may be genet- 
ic or be caused by viral (e.g., HIV) as well as bacterial 
or fungal infections, or may result from autoimmune dis- 
orders. More specifically, infectious diseases caused by 
viral, bacterial, fungal or other infection may be treatable 
using the protein or polypeptide including infections by 
HIV, hepatitis viruses, herpesviruses, mycobacteria, 
Leishmania spp., plamodium. and various fungal infec- 
tions such as candidiasis. Of course, in this regard, a 
protein or polypeptide may also be useful where a boost 
to the immune system generally may be desirable, i.e., 
in the treatment of cancer. 

[0314] Alternatively, the proteins or polypeptides pre- . 
pared as descrtoed above may be used in treatment of 
autoimmune disorders including, for example, connec- 
tive tissue disease, multiple sclerosis, systemic lupus 
erythematosus, rheumatoid arthritis, autoimmune pul- 
monary inflammation, Guillain-Barre syndrome, autoim- - 
mune thyroiditis, insulin dependent diabetes mellitis, 
myasthenia gravis, graft-versus-host disease and au- 
toimmune inflammatory eye disease. Such a protein or 
polypeptide may also to be useful in the treatment of 
allergic reactions and conditions, such as asthma (par- < 
ticularty allergic asthma) or other respiratory problems. 
Other conditions, in which immune suppression is de- 
sired (including, for example, organ transplantation), 
may also be treatable using the protein or polypeptide. 
[0315] Usingthe proteins or polypeptides of the inven- 5 
tion it may also be possible to regulate immune respons- 
es either up or down. Down regulation may involve in- 
hibiting or blocking an immune response already in 
progress or may involve preventing the induction of an 
immune response. The functions of activated T-cells & 
may be inhibited by suppressing T cell responses or by 
inducing specific tolerance in T cells, or both. Immuno- 
suppression of T cell responses is generally an active 



non-antigen-specific process which requires continuous 
exposure of the T cells to the suppressive agent. Toler- 
ance, which involves inducing non-responsiveness or 
energy in T cells, is distinguishable from immunosup- 
5 pression in that it is generally antigen-specific and per- 
sists after the end of exposure to the tolerizing agent. 
Operationally, tolerance can be demonstrated by the 
lack of a T cell response upon reexposure to specific 
antigen in the absence of the tolerizing agent. 
« P>316] Down regulating or preventing one or more an- 
tigen functions (including without limitation B lym- 
phocyte antigen functions, such as, for example, B7 
costimulatlon), e.g., preventing high level lymphokine 
synthesis by activated T cells, will be useful in situations 
» of tissue, skin and organ transplantation and in graft- 
versus-host disease (GVHD). For example, blockage of 
T cell function should result in reduced tissue destruc- 
tion in tissue transplantation. Typically, in tissue trans- 
plants, rejection of the transplant is initiated through its 
20 recognition as foreign by Tcells, followedby an immune 
reaction that destroys the transplant. The administration 
of a molecule which inhibits or blocks interaction of a B7 
lymphocyte antigen with its natural ligand(s) on immune 
cells (such as a soluble, monomelic form of a peptide 
25 having B7-2 activity alone or in conjunction with a mon- 
omeric form of a peptide having an activity of another B 
lymphocyte antigen (e.g., B7-1, B7-3) or blocking anti- 
body), prior to transplantation, can lead to the binding 
of the molecule to the natural ligand(s) on the immune 
30 cells without transmitting the corresponding costimula- 
tory signal. Blocking B lymphocyte antigen function in 
this matter prevents cytokine synthesis by immune cells, 
such as T cells, and thus acts as an immunosuppres- 
sant. Moreover, the lack of costimulation may also be 
as sufficient to anergize the T cells, thereby inducing toler- 
ance in a subject. Induction of long-term tolerance by B 
lymphocyte antigen-blocking reagents may avoid the 
necessity of repeated administration of these blocking 
reagents. To achieve sufficient immunosuppression or 
» tolerance in a subject, it may also be necessary to block 
the function of a combination of B lymphocyte antigens. 
[031 7] The efficacy of particular blocking reagents in 
preventing organ transplant rejection or GVHD can be 
assessed using animal models that are predictive of ef- 
» ficacy in humans. Examples of appropriate systems 
which can be used include allogeneic cardiac grafts in 
rats and xenogeneic pancreatic islet cell grafts in mice, 
both of which have been used to examine the immuno- 
suppressive effects of CTLA4lg fusion proteins in vivo 
o as described in Lenschow et al., Science 257:789-792 
(1992) and Turka et aL, Proc. Natl. Acad. Set USA, 89: 
11102-11105 (1992). In addition, murine models of 
GVHD (see Paul ed, Fundamental Immunology, Raven 
Press, New York, 1989, pp. 846-847) can be used to 
5 determine the effect of blocking B lymphocyte antigen 
function in vivo on the development of that disease. 
[0318] Blocking antigen function may also be thera- 
peutically useful for treating autoimmune diseases. 
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Many autoimmune disorders are the result of inappro- 
priate activation of T cells that are reactive against self 
tissue and which promote the production of cytokines 
and autoantibodies involved in the pathology of the dis- 
eases. Preventing the activation of autoreactive T ceils 
may reduce or eliminate disease symptoms. Adminis- 
tration of reagents which block costimulatlon of cells by 
disrupting receptof/ligand interactions of B lymphocyte 
antigens can be used to inhibit T cell activation and pre- 
vent production of autoantibodies or T cell-derived cy- 
tokines which potentially involved in the disease proc- 
ess. Additionally, blocking reagents may induce anti- 
gen-specific tolerance of autoreactive T cells which 
could lead to long-term relief from the disease. The ef- 
ficacy of blocking reagents in preventing or alleviating 
autoimmune disorders can be determined using a 
number of well-characterized animal models of human 
autoimmune diseases. Examples include murine exper- 
imental autoimmune encephalitis, systemic lupus eryth- 
matosis in MRL/pr/pr mice or NZB hybrid mice, murine 
autoimmuno collagen arthritis, diabetes meilitus in OD 
mice and BB rats, and murine experimental myasthenia 
gravis (see Paul ed., Fundamental Immunology, Raven 
Press, New York, 1989, pp. 840-856). 
[031 9] Upregulation of an antigen function (preferably 
a B lymphocyte antigen function), as a means of up reg- 
ulating immune responses, may also be useful in ther- 
apy. Upregulation of immune responses may involve ei- 
ther enhancing an existing immune response or eliciting 
an initial immune response as shown by the following 
examples. For instance, enhancing an immune re- 
sponse through stimulating B lymphocyte antigen func- 
tion may be useful in cases of viral infection. In addition, 
systemic viral diseases such as influenza, the common 
cold, and encephalitis might be alleviated by the admin- 
istration of stimulatory form of B lymphocyte antigens 
systemically. 

[0320] Alternatively, antiviral immune responses may 
be enhanced in an infected patient by removing T cells 
from the patient, costimutating the T cells in vitro with 
viral antigen-pulsed APCs either expressing the pro- 
teins or polypeptides described above or together with 
a stimulatory form of the protein or polypeptide and re- 
introducing the in vitro primed T cells into the patient. 
The infected cells would now be capable of delivering a 
costimulatory signal to T cells in vivo, thereby activating 
the T cells. 

[0321] In another application, upregulation or en- 
hancement of antigen function (preferably B lymphocyte 
antigen function) may be useful in the induction of tumor 
immunity. Tumor cells (e.g., sarcoma, melanoma, lym- 
phoma, leukemia, neuroblastoma, carcinoma) trans- 
f ected with one of the above-described nucleic acids en- 
coding a protein or polypeptide can be administered to 
a subject to overcome tumor-specific tolerance in the 
subject. If desired, the tumor cell can be transf ected to 
express a combination of peptides. For example, tumor 
cells obtained from a patient can be transfected ex vivo 



with an expression vector directing the expression of a 
peptide having B7-2-like activity alone, or in conjunction 
with a peptide having B7-1 -like activity and/or B7-3-like 
activity. The transfected tumor cells are returned to the 
5 patient to result in expression of the peptides on the sur- 
face of the transfected cell. Alternatively, gene therapy 
techniques can be used to target a tumor cell for trans- 
fection in vivo. 

[0322] The presence of the protein or polypeptide en- 

10 coded by the nucleic acids described above having the 
activity of a B lymphocyte antigen(s) on the surface of 
the tumor cell provides the necessary costimulation sig- 
nal to T cells to induce a T cell mediated immune re- 
sponse against the transfected tumor cells. In addition, 

is tumor cells which lack or which fail to re express suffi- 
cient amounts of MHC class I or MHC class II molecules 
can be transfected with nucleic acids encoding all or a 
portion of (e.g., a cytoplasmiodomain truncated portion) 
of an MHC class I a chain and microglobulin or an 

20 MHC class II chain and an MHC class II p chain to there- 
by express MHC class I or MHC class II proteins on the 
cell surface, respectively. Expression of the appropriate 
MHC class I or class II molecules In conjunction with a 
peptide having the activity of a B lymphocyte antigen (e. 

25 g. , B7-1 , B7-2, B7-3) induces a T cell mediated immune 
response against the transfected tumor cell. Optionally, 
a nucleic acid encoding an antisense construct which 
blocks expression of an MHC class II associated pro- 
tein, such as the invariant chain, can also be cotrans- 

30 fectcd with a DNA encoding a protein or polypeptide 
having the activity of a B lymphocyte antigen to promote 
presentation of tumor associated antigens and induce 
tumor specific immunity. Thus, the induction of a T cell 
mediated immune response in a human subject may be 

35 sufficient to overcome tumor-specific tolerance in the 
subject. Alternatively, as described in more detail below, 
nucleic acids encoding these immune system regulator 
proteins or polypeptides or nucleic acids regulating the 
expression of such proteins or polypeptides may be in- 

40 troduced into appropriate host cells to increase or de- 
crease the expression of the proteins as desired. 

EXAMPLE 24 

45 Assaying the Expressed Proteins or Polypeptides for 
Hematopoiesis Regulating Activity 

[0323] The proteins or polypeptides encoded by the 
nucleic acids described above may also be evaluated 

50 for their hematopoiesis regulating activity. For example, 
the effect of the proteins or polypeptides on embryonic 
stem ceil differentiation may be evaluated. Numerous 
assays for such activity are familiar to those skilled in 
the art, including the assays disclosed in the following 

55 references, which are incorporated herein by reference: 
Johansson etalCell. Biol 15:141-151, 1995; Keller et 
al., Mot. Cell Biol 13:473-486, 1993; McClanahan et 
a/., Biood 81:2903-291 5, 1993. 
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[0324] The proteins or polypeptides encoded by the 
nucleic acids described above may also be evaluated 
for their Influence on the lifetime of stem cells and stem 
cell differentiation. Numerous assays for such activity 
are familiar to those skilled in the an; including the as- s 
says disclosed in the following references, which are in- 
corporated herein by reference: Freshney, M.G. Meth- 
ylcellulose Colony Forming Assays, in Culture of He- 
matopoietic Cells R I Freshney, et al. Eds. pp. 265-268, 
Wiley-Liss, Inc., New York, NY. 1994; Hirayama et al., io 
Proc. Natt. Acad. Sd. USA 89:5907-5911, 1992; Mc- 
Niece, I.K. and Briddell, RA Primitive Hematopoietic 
Colony Forming Cells with High Proliferative Potential, 
in Culture of Hematopoietic Cells, R.I. Freshney, et al. 
eds. Vol pp. 23-39, Wiley-Liss, Inc., New York, NY. 1994; 15 
Neben et al., Experimental Hematology 22:353-359, 
1 994; Ploemacher, R.E. Cobblestone Area Forming Cell 
Assay, In Culture of Hematopoietic Cells, R.I. Freshney, 
et al. Eds. pp. 1-21, Wiley-Liss, Inc., New York, NY. 
1 994; Spooncer, E., Dexter, M. and Allen, T. Long Term 20 
Bone Marrow Cultures in the Presence of Stromal Cells, 
in Culture of Hematopoietic Cells. R.I. Freshney, et al. 
Eds. pp. 163-179, Wiley-Liss, Inc., New York, NY. 1994; 
and Sutherland, H.J. Long Term Culture Initiating Cell 
Assay, in Culture of Hematopoietic Cells. R.I. Freshney, 2S 
et al. Eds. pp. 139-162, Wiley-Liss, Inc., New York, NY. 
1994. 

[0325] Those proteins or polypeptides which exhibit 
hematopoiesis regulatory activity may then be formulat- 
ed as pharmaceuticals and used to treat clinical condi- 30 
tions in which regulation of hematopoeisis is beneficial. 
For example, a protein or polypeptide of the present in- 
vention may be useful in regulation of hematopoiesis 
and, consequently, in the treatment of myeloid or lym- 
phoid cell deficiencies. Even marginal biological activity 35 
in support of colony forming cells or of factor-dependent 
cell lines indicates involvement in regulating hematopoi- 
esis, e.g in supporting the growth and proliferation of 
erythroid progenitor cells alone or in combination with 
other cytokines, thereby indicating utility, for example, 40 
in treating various anemias orfor use in conjunction with 
irradiation/chemotherapy to stimulate the production of 
erythroid precursors and/or erythroid cells; in supporting 
the growth and proliferation of myeloid cells such as 
granulocytes and monocytes/macrophages (i.e., tradi- 45 
tional CSF activity) useful, for example, in conjunction 
with chemotherapy to prevent or treat consequent my- 
elo-suppression; in supporting the growth and prolifer- 
ation of megakaryocytes and consequently of platelets 
thereby allowing prevention or treatment of various so 
platelet disorders such as thrombocytopenia, and gen- 
erally for use in place of or complimentary to platelet 
transfusions; and/or in supporting the growth and prolif- 
eration of hematopoietic stem cells which are capable 
of maturing to any and all of the above-mentioned he- 55 
matopoietic cells and therefore find therapeutic utility in 
various stem cell disorders (such as those usually treat- 
ed with transplantion, including, without limitation, 



aplastic anemia and paroxysmal nocturnal hemoglob- 
inuria), as well as in repopulating the stem cell compart- 
ment post irradiation/chemotherapy, either in-vrvo or ex- 
vivo (i.e., in conjunction with bone marrow transplanta- 
tion or with peripheral progenitor cell transplantation 
(homologous or heterologous)) as normal cells or ge- 
netically manipulated for gene therapy. Alternatively, as 
described in more detail below, nucleic acids encoding 
these proteins or polypeptides or nucleic acids regulat- 
ing the expression of these proteins or polypeptides may 
be introduced into appropriate host cells to increase or 
decrease the expression of the proteins as desired. 

EXAMPLE 25 

Assaying the Expressed Proteins or Polypeptides for 
Regulation ofTlssue Growth 

[0326] The proteins or polypeptides encoded by the 
nucleic acids described above may also be evaluated 
for their effect on tissue growth. Numerous assays for 
such activity are familiar to those skilled in the art, in- 
cluding the assays disclosed in International Patent 
Publication No. WO95/16035, International Patent Pub- 
lication No. WO95/05B46 and International Patent Pub- 
lication No. WO91/07491 , which are incorporated here- 
in by reference. 

[0327] Assays for wound healing activity include, 
without limitation, those described in: Winter, Epidermal 
Wound Healing, pps. 71-112 (Maibach, HI and Rovee, 
DT, eds.), Year Book Medical Publishers, Inc., Chicago, 
as modified by Eaglstein and Mertz, J. Invest. Dermatol 
71:382-84(1978) which are incorporated herein by ref- 
erence. 

[0328] Those proteins or polypeptides which are in- 
volved in the regulation of tissue growth may then be 
formulated as pharmaceuticals and used to treat clinical 
conditions in which regulation of tissue growth is bene- 
ficial. For example, a protein or polypeptide may have 
utility in compositions used for bone, cartilage, tendon, 
ligament and/or nerve tissue growth or regeneration, as 
well as for wound healing and tissue repair and replace- 
ment, and in the treatment of bums, incisions and ulcers. 
[0329] A protein or polypeptide encoded by the nucle- 
ic acids described above which induces cartilage and/ 
or bone growth in circumstances where bone is not nor- 
mally formed, has application in the healing of bonefrac- 
tures and cartilage damage or defects in humans and 
other animals. Such a preparation employing a protein 
or polypeptide of the invention may have prophylactic 
use in closed as well as open fracture reduction and also 
in the improved fixation of artificial joints. De novo bone 
synthesis induced by an osteogenic agent contributes 
to the repair of congenital, trauma induced, or oncologic 
resection induced craniofacial defects, and also is use- 
ful in cosmetic plastic surgery. 

[0330] A protein or polypeptide of this invention may 
also be used in the treatment of periodontal disease, 
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and in other tooth repair processes. Such agents may 
provide an environment to attract bone-forming cells, 
stimulate growth of bone-forming cells or induce differ- 
entiation of progenitors of bone-forming cells. A protein 
of the invention may also be useful in the treatment of 
osteoporosis or osteoarthritis, such as through stimula- 
tion of bone and/or cartilage repair or by blocking inflam- 
mation or processes of tissue destruction (collagenase 
activity, osteoclast activity, etc.) mediated by inflamma- 
tory processes. 

[0331 ] Another category of tissue regeneration activ- 
ity that may be attributable to the proteins or polypep- 
tides encoded by the nucleic acids described above is 
tendon/ligament formation. A protein or polypeptide en- 
coded by the nucleic acids described above, which in- 
duces tendon/ligament-like tissue or other tissue forma- 
tion in circumstances where such tissue is not normally 
formed, has application in the healing of tendon or liga- 
ment tears, deformities and other tendon or ligament de- 
fects in humans and other animals. Such a preparation 
employing a tendon/ligament-like tissue inducing pro- 
tein may have prophylactic use in preventing damage 
to tendon or ligament tissue, as well as use in the im- 
proved fixation of tendon or ligament to bone or other 
tissues, and in repairing defects to tendon or ligament 
tissue. De novo tendon/ligament-like tissue formation 
induced by a protein or polypeptide of the present in- 
vention contributes to the repair of tendon or ligaments 
defects of congenital, traumatic or other origin and is 
also useful in cosmetic plastic surgery for attachment or 
repair of tendons or ligaments. The proteins or polypep- 
tides of the present invention may provide an environ- 
ment to attract tendon- or ligament-forming cells, stim- 
ulate growth of tendon- or ligament-forming cells, induce 
differentiation of progenitors of tendon- or ligament- 
forming cells, or induce growth of tendon/ligament cells 
or progenitors ex vivo for return in vivo to effect tissue 
repair. The proteins or polypeptides of the invention may 
also be useful in the treatment of tendinitis, carpal tunnel 
syndrome and other tendon or ligament defects. The 
therapeutic compositions may also include an appropri- 
ate matrix and/or sequestering agent as a earner as is 
well known in the art. 

[0332] The proteins or polypeptides of the present in- 
vention may also be useful for proliferation of neural 
cells and for regeneration of nerve and brain tissue, i. 
e., for the treatment of central and peripheral nervous 
system diseases and neuropathies, as well as mechan- 
ical and traumatic disorders, which involve degenera- 
tion, death or trauma to neural cells or nerve tissue. 
More specifically, a protein or polypeptide may be used 
in the treatment of diseases of the peripheral nervous 
system, such as peripheral nerve injuries, peripheral 
neuropathy and localized neuropathies, and central 
nervous system diseases, such as Alzheimer's, Parkin- 
son's disease, Huntington's disease, amyotrophic later- 
al sclerosis, and Shy-Drager syndrome. Further condi- 
tions which may be treated in accordance with the 



present invention include mechanical and traumatic dis- 
orders, such as spinal cord disorders, head trauma and 
cerebrovascular diseases such as stroke. Peripheral 
neuropathies resulting from chemotherapy or other 

5 medical therapies may also be treatable using a protein 
or polypeptide of the invention. 
[0333] Proteins or polypeptides of the invention may 
also be useful to promote better or faster closure of non- 
healing wounds, including without limitation pressure ul- 

io cers, ulcers associated with vascular insufficiency, sur- 
gical and traumatic wounds, and the like. 
[0334] It is expected that a protein or polypeptide of 
the present invention may also exhibit activity for gen- 
eration or regeneration of other tissues, such as organs 

is (including, for example, pancreas, liver, intestine, kid- 
ney, skin, endothelium) muscle (smooth, skeletal or car- 
diac) and vascular (including vascular endothelium) tis- 
sue, orf or promoting the growth of cells comprising such 
tissues. Part of the desired effects may be by inhibition 

20 or modulation of fibrotic scarring to allow normal tissue 
to generate. A protein or polypeptide of the invention 
may also exhibit angiogenic activity. 
[0335] A protein or polypeptide of the present inven- 
tion may also be useful for gut protection or regeneration 

25 and treatment of lung or liver fibrosis, reperf usion injury 
in various tissues, and conditions resulting from system- 
ic cytokine damage. 

[0336] A protein or polypeptide of the present inven- 
tion may also be useful for promoting or inhibiting differ- 
so entiation of tissues described above from precursor tis- 
sues or cells; or for inhibiting the growth of tissues de- 
scribed above. 

[0337] Alternatively, as described in more detail be- 
low, nucleic acids encoding tissue growth regulating ac- 
35 trvlty proteins or polypeptides or nucleic acids regulating 
the expression of such proteins or polypeptides may be 
introduced into appropriate host cells to increase or de- 
crease the expression of the proteins as desired. 

40 EXAMPLE 26 

Assaying the Expressed Proteins or Polypeptides for 
Regulation of Reproductive Hormones 

45 [0338] The proteins or polypeptides of the present in- 
vention may also be evaluated for their ability to regulate 
reproductive hormones, such as follicle stimulating hor- 
mone. Numerous assays for such activity are familiar to 
those skilled in the art, including the assays disclosed 

so in the following references, which are incorporated here- 
in by reference: Vale et al., Endocrinol. 91 :562-572, 
1972; Ling era/.. Nature 321:779-782, 1986; Vale ef a/., 
Nature 321:776-779, 1986; Mason et a/.. Nature 318: 
659-663, 1 985; Forage era/., Proc. Natl. Acad Set. USA 

55 83:3091 -3095, 1 986. Chapter 6 1 2 in Current Protocols 
in Immunology, J.E. Coligan etal. Eds. Greene Publish- 
ing Associates and Wiley-lntersciece ; Taub et al. J. 
Clin. Invest 95:1370-1376, 1995; Lind era/. 103: 
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140-146, 1995; Muller et al. Eur. J. Immunol. 25: 
1744-1748; Gruber et al. J. Immunol. 152:5860-5867, 
1994; Johnston et al., J. Immunol. 153:1762-1768, 
1994. 

[0339] Those proteins or polypeptides which exhibit 5 
activity as reproductive hormones or regulators of cell 
movement may then be formulated as pharmaceuticals 
and used to treat clinical conditions in which regulation 
of reproductive hormones are beneficial. For example, 
a protein or polypeptide may exhibit activin- or inhibin- to 
related activities. Inhibins are characterized by their 
ability to inhibit the release of follicle stimulating hor- 
mone (FSH), while activins are characterized by their 
ability to stimulate the release of FSH. Thus, a protein 
or polypeptide of the present invention, alone or in het- m 
erodimers with a member of the inhibin family, may be 
useful as a contraceptive based on the ability of inhibins 
to decrease fertility in female mammals and decrease 
spermatogenesis in male mammals. Administration of 
sufficient amounts of other inhibins can induce infertility 20 
in these mammals. Aftemativefy, the protein or polypep- 
tide of the invention, as a homodimeror as a heterodim- 
er with otherprotein subunits of the inhibin-B group, may 
be useful as a fertility inducing therapeutic, based upon 
the ability of activin molecules in stimulating FSH re- 25 
lease from cells of the anterior pituitary. See, for exam- 
ple, United States Patent 4,798,885, the disclosure of 
which is incorporated herein by reference. A protein or 
polypeptide of the invention may also be useful for ad- 
vancement of the onset of fertility in sexually immature 30 
mammals, so as to increase the lifetime reproductive 
performance of domestic animals such as cows, sheep 
and pigs. 

[0340] Alternatively, as described in more detail be- 
low, nucleic acids encoding reproductive hormone reg- 35 
ulating activity proteins or polypeptides or nucleic acids 
regulating the expression of such proteins or polypep- 
tides may be introduced into appropriate host cells to 
increase or decrease the expression of the proteins or 
polypeptides as desired. 40 



of wounds and other trauma to tissues, as well as in 
treatment of localized infections. For example, attraction 
of lymphocytes, monocytes or neutrophils to tumors or 
sites of infection may result in improved immune re- 
sponses against the tumor or infecting agent. 
[0342] A protein or polypeptide has chemotactic ac- 
tivity for a particular cell population if it can stimulate, 
directly or indirectly, the directed orientation or move- 
ment of such cell population. Preferably, the protein or 
polypeptide has the ability to directly stimulate directed 
movement of cells. Whether a particular protein or 
polypeptide has chemotactic activity for a population of 
cells can be readily determined by employing such pro- 
tein or polypeptide in any known assay for cell chemo- 
taxis. 

[0343] The activity of a protein or polypeptide of the 
invention may, among other means, be measured by the 
following methods: 

[0344] Assays for chemotactic activity (which will 
identify proteins or polypeptides that induce or prevent 
chemotaxis) consist of assays that measure the ability 
of a protein or polypeptide to induce the migration of 
cells across a membrane as well as the ability of a pro- 
tein or polypeptide to induce the adhesion of one cell 
population to another cell population. Suitable assays 
for movement and adhesion include, without limitation, 
those described in: Current Protocols in Immunology, Ed 
by J.E. Coligan, A.M. Kruisbeek, D.H. Margulies, E.M. 
Shevach, W. Strober, Pub. Greene Publishing Associ- 
ates and Wiley-lnterscience.Chapter 6.12: 
6.12.1 -6.12.28; Taubef al. J. Clin. Invest 95:1370-1376, 
1995; Lind et al. APMIS 103:140-146, 1995; Mueller et 
al., Eur. J. Immunol. 25:1744-1748; Gruber etai J Im- 
munol. 152:5860-5867, 1 994; Johnston et al. J. Immu- 
nol, 153:1762-1768, 1994. 

EXAMPLE 28 

Assaying the Expressed Proteins or Polypeptides for 
Regulation of Blood Clotting 



EXAMPLE 27 

Assaying the Expressed Proteins or Polypeptides For 
Chemotactic/Chemokinetic Activity 

[0341 ] The proteins or polypeptides of the present in- 
vention may also be evaluated for chemotactic/chem- 
okinetic activity. For example, a protein or polypeptide 
of the present invention may have chemotactic orchem- 
okinetic activity (e.g., act as a chemokine) for mamma- 
lian cells, including, for example, monocytes, fibrob- 
lasts, neutrophils, T-cells, mast cells, eosinophils, epi- 
thelial and/or endothelial cells. Chemotactic and chem- 
okinetic proteins or polypeptides can be used to mobi- 
lize or attract a desired cell population to a desired site 
of action. Chemotactic or chemo kinetic proteins or 
polypeptides provide particular advantages in treatment 



[0345] The proteins or polypeptides of the present in- 
vention may also be evaluated for their effects on blood 
clotting. Numerous assays for such activity are familiar 

45 to those skilled in the art, including the assays disclosed 
in the following references, which are incorporated here- 
in by reference: Linet et al., J. Clin. Pharmacol. 26: 
131-140, 1986; Burdick et al. t Thrombosis Res. 45: 
413-419, 1987; Humphrey et al., Fibrinolysis 5:71-79 

50 (1 991 ); Schaub, Prostaglandins 35:467^*74, 1 988. 
[0346] Those proteins or polypeptides which are in- 
volved in the regulation of blood clotting may then be 
formulated as pharmaceuticals and used to treat clinical 
conditions in which regulation of blood clotting is bene- 

55 ficial. For example, a protein or polypeptide of the inven- 
tion may also exhibit hemostatic or thrombolytic activity. 
As a result, such a protein or polypeptide is expected to 
be useful in treatment of various coagulations disorders 
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(including hereditary disorders, such as hemophilias) or 
to enhance coagulation and other hemostatic events in 
treating wounds resulting from trauma, surgery or other 
causes. A protein or polypeptide of the invention may 
also be useful for dissolving or inhibiting formation of 
thromboses and for treatment and prevention of condi- 
tions resulting therefrom (such as infarction of cardiac 
and central nervous system vessels (e.g., stroke)). Al- 
ternatively, as described in more detail below, nucleic 
acids encoding blood clotting activity proteins or 
polypeptides or nucleic acids regulating the expression 
of such proteins or polypeptides may be introduced into 
appropriate host cells to increase or decrease the ex- 
pression of the proteins or polypeptides as desired. 

EXAMPLE 29 

Assaying the Expressed Proteins or Polypeptides for 
Involvement in Receptor/Ugand Interactions 

[0347] The proteins or polypeptides of the present in- 
vention may also be evaluated for their involvement in 
receptor/ligand interactions. Numerous assays for such 
involvement are familiar to those skilled in the art, in- 
cluding the assays disclosed in the following references, 
which are incorporated herein by reference: Chapter 7. 
7.28.1 -728.72) in Current Protocols in immunology, J.E. 
Coiigan et ai. Eds. Greene Publishing Associates and 
Wiley-lnterscience; Takai ef a/., Proc. Natl. Acad. ScL 
USA 84:6864-6868, 1987; Bicrer etal.J. Exp. Med. 168: 
1145-1156, 1988; Rosenstein etaL, J. Exp. Med. 169: 
149-160, 1989; Stoltenborg era/., J. Immunol. Methods 
175:59-68, 1994; Stitteral, Cell 80:661 -670, 1995; Gy- 
urisetal., Cell 75:791 -803, 1993. 
[0348] For example, the proteins or polypeptides of 
the present invention may also demonstrate activity as 
receptors, receptor ligands or inhibitors or agonists of 
receptor/ligand interactions. Examples of such recep- 
tors and ligands include, without limitation, cytokine re- 
ceptors and their ligands, receptor kinases and their lig- 
ands, receptor phosphatases and their ligands, recep- 
tors involved in cell-cell interactions and their ligands 
(including without limitation, cellular adhesion mole- 
cules (such as selectins, integrins and their ligands) and 
receptor/ligand pairs involved in antigen presentation, 
antigen recognition and development of cellular and hu- 
moral immune responses). Receptors and ligands are 
also useful for screening of potential peptide or smalt 
molecule inhibitors of the relevant receptor/ligand inter- 
action. A protein or polypeptide of the present invention 
(including, without limitation, fragments of receptors and 
ligands) may be useful as inhibitors of receptor/ligand 
interactions. Alternatively, as described in more detail 
below, nucleic acids encoding proteins or polypeptides 
involved in receptor/ligand interactions or nucleic acids 
regulating the expression of such proteins or polypep- 
tides may be introduced into appropriate host cells to 
increase or decrease the expression of the proteins or 



polypeptides as desired. 

EXAMPLE 30 

5 Assaying the Proteins or Polypeptides for Anti- 
Inflammatory Activity 

[0349] The proteins or polypeptides of the present in- 
vention may also be evaluated for antiinflammatory ae- 
ro tivity. The anti-inflammatory activity may be achieved by 
providing a stimulus to cells involved in the inflammatory 
response, by inhibiting or promoting cell-cell interac- 
tions (such as, for example, cell adhesion), by inhibiting 
or promoting chemotaxis of cells involved in the inftam- 
ts matory process, inhibiting or promoting ceil extravasa- 
tion, or by stimulating or suppressing production of other 
factors which more directly inhibit or promote an inflam- 
matory response. Proteins or polypeptides exhibiting 
such activities can be used to treat inflammatory condi- 
20 tions including chronic or acute conditions, including 
without limitation inflammation associated with infection 
(such as septic shock, sepsis or systemic inflammatory 
response syndrome), ischemia-reperfusioninury, endo- 
toxin lethality, arthritis, complement-mediated hypera- 
25 cute rejection, nephritis, cytokine- or chemokine-in- 
duced lung injury, inflammatory bowel disease, Crohn's 
disease or resulting from over production of cytokines 
such as TNF or IL-1 . Proteins or polypeptides of the in- 
vention may also be useful to treat anaphylaxis and hy- 
30 persensitivity to an antigenic substance or material. Al- 
ternatively, as described in more detail below, nucleic 
acids encoding anti-inflammatory activity proteins or 
polypeptides or nucleic acids regulating the expression 
of such proteins or polypeptides may be introduced into 
35 appropriate host cells to increase or decrease the ex- 
pression of the proteins or polypeptides as desired. 

EXAMPLE 31 

40 Assaying the Expressed Proteins or Polypeptides for 
Tumor Inhibition Activity 

[0350] The proteins or polypeptides of the present in- 
vention may also be evaluated for tumor inhibition ac- 

<5 tivity. In addition to the activities described above for im- 
munological treatment or prevention of tumors, a protein 
or polypeptide of the invention may exhibit other antitu- 
mor activities. A protein or polypeptide may inhibit tumor 
growth directly or indirectly (such as, for example, via 

so ADCC). A protein or polypeptide may exhibit its tumor 
inhibitory activity by acting on tumor tissue or tumor pre- 
cursor tissue, by inhibiting formation of tissues neces- 
sary to support tumor growth (such as, for example, by 
inhibiting angiogenesis), by causing production of other 

55 factors, agents or cell types which inhibit tumor growth, 
or by suppressing, eliminating or inhibiting factors, 
agents or cell types which promote tumor growth. . Al- 
ternatively, as described in more detail below, nucleic 
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acids encoding proteins or polypeptides with tumor in- 
hibition activity or nucieic acids regulating the expres- 
sion of such proteins or polypeptides may be introduced 
into appropriate host cells to increase or decrease the 
expression of the proteins or polypeptides as desired. 
[0351] A protein or polypeptide of the invention may 
also exhibit one or more of the following additional ac- 
tivities or effects: inhibiting the growth, infection or func- 
tion of, or Wiling, infectious agents, including, without 
limitation, bacteria, viruses, fungi and other parasites; 
effecting (suppressing or enhancing) bodily character- 
istics, including, without limitation, height, weight, hair 
color, eye color, skin, fat to lean ratio or other tissue pig- 
mentation, or organ or body part size or shape (such as, 
for example, breast augmentation or diminution, change 
in boneform or shape); effecting biorhythms orcircadian 
cycles or rhythms; effecting the fertility of male or female 
subjects; effecting the metabolism, catabolism, anabo- 
lism, processing, utilization, storage or elimination of di- 
etary fat, lipid, protein, carbohydrate, vitamins, minerals, 
cofactors or other nutritional factors or components); 
effecting behavioral characteristics, including without 
limitation, appetite, libido, stress, cognition (including 
cognitive disorders), depression (including depressive 
disorders) and violent behaviors, providing analgesic ef- 
fects or other pain reducing effects; promoting differen- 
tiation and growth of embryonic stem cells in lineages 
other than hematopoietic lineages; hormonal or endo- 
crine activity; in the case of enzymes, correcting defi- 
ciencies of the enzyme and treating deficiency-related 
diseases; treatment of hyperproliferative disorders 
(such as, for example, psoriasis); immunoglobulin-like 
activity (such as, for example, theabilftyto bind antigens 
or complement); and the ability to act as an antigen in 
a vaccine composition to raise an immune response 
against such protein or another material or entity which 
is cross-reactive with such protein. Alternatively, as de- 
scribed in more detail below, nucleic acids encoding pro- 
teins or polypeptides involved in any of the above men- 
tioned activities or nucieic acids regulating the expres- 
sion of such proteins may be introduced into appropriate 
host cells to increase or decrease the expression of the 
proteins or polypeptides as desired. 



EXAMPLE 32 



Identification of Proteins or Polypeptides which Interact 
with Proteins or Polypeptides of the Present Invention 



[0352] Proteins or polypeptides which interact with 
the proteins or polypeptides of the present invention, 
such as receptor proteins, may be identified using two 
hybrid systems such as the Matchmaker Two Hybrid 
System 2 (Catalog No. K1 604-1, Clontech). As de- 
scribed in the manual accompanying the kit which is in- 
corporated herein by reference, nucleic acids encoding 
the proteins or polypeptides of the present invention , are 
inserted into an expression vector such that they are in 



frame with DNA encoding the DNA binding domain of 
the yeast transcriptional activator GAL4. cDNAs in a cO- 
NA library which encode proteins or polypeptides which 
might interact with the proteins or polypeptides of the 
5 present invention are inserted into a second expression 
vector such that they arc in frame with DNA encoding 
the activation domain of GAL4. The two expression 
plasm ids are transformed into yeast and the yeast are 
plated on selection medium which selects for expres- 
10 sion of selectable markers on each of the expression 
vectors as well as GAL4 dependent expression of the 
HIS3 gene. Transformants capable of growing on medi- 
um lacking histidine are screened for GAL4 dependent 
lacZ expression. Those cells which are positive in both 
is the histidine selection and the lacZ assay contain plas- 
mids encoding proteins or polypeptides which interact 
with the proteins or polypeptides of the present inven- 
tion. 

[0353] Alternatively, the system described in Lustig et 
20 at., Methods in Enzymohgy 283:83-99 (1 997), the dis- 
closure of which is incorporated herein by reference, 
may be used for identifying molecules which interact 
with the proteins or polypeptides of the present inven- 
tion. In such systems, in vitro transcription reactions are 
25 performed on a pool of vectors containing nucleic acid 
inserts which encode the proteins or polypeptides of the 
present invention. The nucleic add inserts are cloned 
downstream of a promoter which drives in vitro tran- 
scription. The resulting pools of mRNAs are introduced 
30 into Xenopus iaevis oocytes. The oocytes are then as- 
sayed for a desired activity. 

[0354] Alternatively, the pooled in vitro transcription 
products produced as described above may be translat- 
ed in vitro, The pooled in vitro translation products can 
55 be assayed for a desired activity or for interaction with 
a known protein or polypeptide. 
[0355] Proteins, polypeptides or other molecules in- 
teracting with proteins or polypeptides of the present in- 
vention can be found by a variety of additional tech- 
40 niques. In one method, affinity columns containing the 
protein or polypeptide of the present invention can be 
constructed. In some versions, of this method the affinity 
column contains chimeric proteins in which the protein 
or polypeptide of the present invention is fused to glu- 
45 tathione S-transferase. A mixture of cellular proteins or 
pool of expressed proteins as described above and is 
applied to the affinity column. Molecules interacting with 
the protein or polypeptide attached to the column can 
then be isolated and analyzed on 2-D electrophoresis 
50 gel as described in Ramunsen etai. Electrophoresis, 18, 
588-598 (1 997), the disclosure of which is incorporated 
herein by reference. Alternatively, the molecules re- 
tained on the affinity column can be purified by electro- 
phoresis based methods and sequenced. The same 
55 method can be used to isolate antibodies, to screen 
phage display products, or to screen phage display hu- 
man antibodies. 

[0356] Molecules interacting with the proteins or 
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polypeptides of the present invention can also be 
screened by using an Optica) Biosensor as described in 
Edwards & Leatherbarrow, Analytical Biochemistry, 
246, 1-6 (1997), the disclosure of which is incorporated 
herein by reference. The main advantage of the method 5 
is that it allows the determination of the association rate 
between the protein or polypeptide and other Interacting 
molecules. Thus, it is possible to specifically select in- 
teracting molecules with a high or low association rate. 
Typically a target molecule is linked to the sensor sur- 10 
face (through a carboxymethl dextran matrix) and a 
sample of test molecules is placed in contact with the 
target molecules. The binding of a test molecule to the 
target molecule causes a change in the refractive index 
and/ or thickness. 15 
[0357] This change is detected by the Biosensor pro- 
vided it occurs in the evanescent field (which extend a 
few hundred nanometers from the sensor surface). In 
these screening assays, the target molecule can be one 
of the proteins or polypeptides of the present invention 20 
and the test sample can be a collection of proteins, 
polypeptides or other molecules extracted from tissues 
or cells, a pool of expressed proteins, combinatorial 
peptide and/ or chemical libraries, or phage displayed 
peptides. The tissues or cells from which the test mole- 25 
cules are extracted can originate from any species. 
[0358] In other methods, a target protein or polypep- 
tide is immobilized and the test population is a collection 
of unique proteins or polypeptides of the present inven- 
tion. 30 
To study the interaction of the proteins or polypeptides 
of the present invention with drugs, the microdiatysis 
coupled to HPLC method described by Wang ef al., 
Chromatographia, 44, 205-208(1 997) orthe affinity cap- 
illary electrophoresis method described by Busch ef al., 35 
J. Chromatogr. 777:311-328 (1997), the disclosures of 
which are incorporated herein by reference can be used. 
[0359] The system described in U.S. Patent No. 
5,654,1 50, the disclosure of which is incorporated here- 
in by reference, may also be used to identify molecules *o 
which interact with the proteins or polypeptides of the 
present invention. In this system, pools of nucleic acids 
encoding the proteins or polypeptides of the present in- 
vention are transcribed and translated in vitro and the 
reaction products are assayed for interaction with a 45 
known polypeptide or antibody. 
[0360] It will be appreciated by those skilled in the art 
that the proteins or polypeptides of the present invention 
may be assayed for numerous activities in addition to 
those specifically enumerated above. For example, the so 
expressed proteins or polypeptides may be evaluated 
for applications involving control and regulation of in- 
flammation, tumor proliferation or metastasis, infection, 
or other clinical conditions. In addition, the proteins or 
polypeptides may be useful as nutritional agents or cos- 55 
metic agents. 



Epitopes and Antibody Fusions 

[0361 J A preferred embodiment of the present inven- 
tion is directed to eiptope-bearing polypeptides and 
epitope-bearing polypeptide fragments. These epitopes 
may be "antigenic epitopes" or both an "antigenic 
epitope" and an "immunogenic epitope". An "immuno- 
genic epitope" is defined as a part of a protein that elicits 
an antibody response in vivo when the polypeptide is 
the immunogen. On the other hand, a region of polypep- 
tide to which an antibody binds is defined as an "anti- 
genic determinant" or "antigenic epitope." The number 
of immunogenic epitopes of a protein generally is less 
than the number of antigenic epitopes. See, e.g., Gey- 
sen, et al. (1983) Proc. Natl. Acad. Sci. USA 81: 
39984002. It is particularly noted that although a partic- 
ular epitope may not be immunogenic, it is nonetheless 
useful since antibodies can be made in vitro to any 
epitope. 

[0362] An epitope can comprise as few as 3 amino 
acids in a spatial conformation which is unique to the 
epitope. Generally an epitope consists of at least 6 such 
amino acids, and more often at least 8-10 such amino 
acids. In preferred embodiment, antigenic epitopes 
comprise a number of amino acids that is any integer 
between 3 and 50. Fragments which function as 
epitopes may be produced by any conventional means. 
See, e.g., Houghten, R. A., Proc. Natl. Acad. Sci. USA 
82:5131-5135 (1985), further described in U.S. Patent 
No. 4,631 ,21 1 . Methods for determining the amino acids 
which make up an immunogenic epitope include x-ray 
crystallography, 2-dimensional nuclear magnetic reso- 
nance, and epitope mapping, e.g., the Pepscan method 
described by H. Mario Geysenetal. (1984);.. Proc. Natl. 
Acad. Sci. U.S.A. 81:3998-4002; PCT Publication No. 
WO 84/03564; and PCT Publication No. WO 84/03506. 
Another example is the algorithm of Jameson and Wolf, 
Comp. Appl. Biosci. 4:181-186 (1988) (said references 
incorporated by reference in their entireties). The Jame- 
son-Wolf antigenic analysis, for example, may be per- 
formed using the computer program PROTEAN, using 
defautt parameters (Version 4.0 Windows, DNASTAR, 
Inc., 1228 South Park Street Madison, Wl. 
[0363] The epitope-bearing fragments of the present 
invention preferably comprises 6 to 50 amino acids (i.e. 
any integer between 6 and 50, inclusive) of a polypep- 
tide of the present invention. Also, included in the 
present invention are antigenic fragments between the 
integers of 6 and the full length sequence of the se- 
quence listing. All combinations of sequences between 
the integers of 6 and the full-length sequence of a 
polypeptide of the present invention are included. Hie 
epitope-bearing fragments may be specified by either 
the number of contiguous amino acid residues (as a 
sub-genus) or by specific N-terminal and C-terminal po- 
sitions (as species) as described above for the polypep- 
tide fragments of the present invention. Any number of 
epitope-bearing fragments of the present invention may 



45 



EP1 104808A1 



90 



aJso be excluded in the same manner. 
[0364] Antigenic epitopes are useful, for example, to 
raise antibodies, including monoclonal antibodies that 
specifically bind the epitope (See.Wilson et al., 1984; 
and Sulcliffe, J. a et al., 1 983). The antibodies are then 
used in various techniques such as diagnostic and tis- 
sue/cell identification techniques, as described herein, 
and in purification methods. 

[0365] Similarly, immunogenic epitopes can be used 
to induce antibodies according to methods well known 
in the art (See, Sutdiffe et al., supra; Wilson et al., supra; 
Chow, M. et al.;(1985) and Bittle, F. J. et al., (1985). A 
preferred immunogenic epitope includes the polypep- 
tides of thesequence listing. The immunogenic epitopes 
may be presented together with a carrier protein, such 
as an albumin, to an animal system (such as rabbit or 
mouse) or, if it is long enough (at least about 25 amino 
acids), without a carrier. However, immunogenic 
epitopes comprising as few as 8 to 1 0 amino acids have 
been shown to be sufficient to raise antibodies capable 
of binding to, at the very least, linear epitopes in a de- 
natured polypeptide (e.g., in Western blotting.). 
[0366] Epitope-bearing polypeptides of the present 
invention are used to induce antibodies according to 
methods well known in the art including, but not limited 
to, in vivo immunization, in vitro immunization, and 
phage display methods (See, e.g., Sutdiffe, etal., supra; 
Wilson, et al., supra, and Bittle, et al., 1985). If in vivo 
immunization is used, animals may be immunized with 
free peptide; however, anti-peptide antibody titer may 
be boosted by coupling of the peptide to a macromo- 
lecular carrier, such as keyhole limpet hemacyanin 
(KLH) or tetanus toxoid. For instance, peptides contain- 
ing cysteine residues may be coupled to a earner using 
a linker such as -maleimidobenzoyl-N-hydroxysuccin- 
imide ester (MBS), while other peptides may be coupled 
to carriers using a more general linking agent such as 
glutaraldehyde. Animals such as rabbits, rats and mice 
are immunized with either free or carrier-coupled pep- 
tides, for instance, by intraperitoneal and/or intradermal 
injection of emulsions containing about 1 00 u.gs of pep- 
tide or carrier protein and Freuncfs adjuvant. Several 
booster injections ma/ be needed, for instance, at in- 
tervals of about two weeks, to provide a useful titer of 
anti-peptide antibody, which can be detected, for exam- 
ple, by ELISA assay using free peptide adsorbed to a 
solid surface. The titer of anti-peptide antibodies in se- 
rum from an immunized animal may be increased by se- 
lection of anti-peptide antibodies, for instance, by ad- 
sorption to the peptide on a solid support and elution of 
the selected antibodies according to methods well 
known in the art. 

[0367] As one of skill in the art will appreciate, and 
discussed above, the polypeptides of the present inven- 
tion comprising an immunogenic or antigenic epitope 
can be fused to heterologous polypeptide sequences. 
For example, the polypeptides of the present invention 
may be fused with the constant domain of immunoglob- 



ulins (IgA, IgE, IgG, IgM), or portions thereof (CH1 , CH2, 
CH3, any combination thereof induding both entire do^ 
mains and portions thereof) resulting in chimeric 
polypeptides. These fusion proteins fadlitate purifica- 
s tion, and show an increased half-life tn vivo. This has 
been shown, e.g., for chimeric proteins consisting of the 
first two domains of the human CD4-polypeptide and 
various domains of the constant regions of the heavy or 
light chains of mammalian immunoglobulins (See, e.g., 
10 EPA 0,394,827; and Traunecker et al., 1988). Fusion 
proteins that have a disulfide-linked dimeric structure 
due to the IgG portion can also be more efficient in bind- 
ing and neutralizing other molecules than monomeric 
polypeptides or fragments thereof alone (See, eg 
Fountoulakis et al., 1995). Nudeic adds encoding the 
above epitopes can also be recombined with a gene of 
interest as an epitope tag to aid in detection and purifi- 
cation of the expressed polypeptide. 
[0368] Additonal fusion proteins of the invention may 
20 be generated through the techniques of gene-shuffling, 
motif-shuffling, exon-shuffling, or codon-shuffling (col- 
lectively referred to as "DNA shuffling"). DNA shuffling 
may be employed to modulate the activities of polypep- 
tides of the present invention thereby effectively gener- 
is ating agonists and antagonists of the polypeptides. See, 
for example, U.S. Patent Nos.: 5,605,793; 5,811 238- 
5,834,252; 5,837,458; and Patten, PA, et al., (1997);' 
Harayama, S., (1998); Hansson, L.O., etal (1999); and 
Lorenzo, M.M. and Blasco, R., (1998). (Each of these 
a? documents are hereby incorporated by reference). In 
one embodiment, one or more components, motifs, sec- 
tions, parts, domains, fragments, etc., of coding polynu- 
cleotides of the invention, or the polypeptides encoded 
thereby may be recombined with one or more compo- 
se nents, motifs, sections, parts, domains, fragments, etc. 
of one or more heterologous molecules. 

Antibodies 



40 [0369] The present invention further relates to anti- 
bodies and T-celi antigen receptors (TCR) which specif- 
ically bind the polypeptides of the present invention. The 
antibodies of the present invention include IgG (includ- 
ing lgG1 , lgG2, lgG3, and lgG4), IgA (including lgA1 and 
45 lgA2). lgD,lgE,orlgM,andlgY. As used herein, the term 
"antibody" (Ab) is meant to include whole antibodies, in- 
cluding single-chain whole antibodies, and antigen- 
binding fragments thereof. In a preferred embodiment 
the antibodies are human antigen binding antibody frag- 
& ments of the present invention include, but are not lim- 
ited to, Fab, Fab' F(ab)2 and F(ab')2, Fd, single-chain 
Fvs (scFv), single-chain antibodies, disulfide-linked Fvs 
(sdFv) and fragments comprising either a V L or V H do- 
main. The antibodies may be from any animal origin in- 
55 eluding birds and mammals Preferably, the antibodies 
are human, murine, rabbit, goat, guinea pig, camel, 
horse, or chicken. 

[0370] Antigen-binding antibody fragments, including 
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single-chain antibodies, may comprise the variable re- 
gion(s) alone or in combination with the entire or partial 
of the following: hinge region, CH1 , CH2, and CH3 do- 
mains. Also included in the invention are any combina- 
tions of variable region(s) and hinge region, CH1 , CH2, 5 
and CH3 domains. The present invention further in- 
cludes chimeric, humanized, and human monoclonal 
and polyclonal antibodies which specifically bind the 
polypeptides of the present invention. The present in- 
vention further includes antibodies which are anthidio- to 
typic to the antibodies of the present invention. 
[0371 ] The antibodies of the present invention may be 
monospecific, bispecific, trispecific or of greater multi- 
specificity Multispecific antibodies may be specific for 
different epitopes of a polypeptide of the present inven- 
tion or may be specific for both a polypeptide of the 
present invention as well as for heterologous composi- 
tions, such as a heterologous polypeptide or solid sup- 
port material. See, e.g., WO 93/17715; WO 92/08802; 
WO 91/00360; WO 92/05793; Tutt, A. et a!. (1991) J. 20 
Immunol. 147:60-69; US Patents 5,573,920, 4,474,893, 
5,601,819, 4,714,681, 4,925,648; Kostelny, S.A. et al. 
(1992) J. Immunol. 148:1547-1553. 
[0372] In some embodiments, the antibodies may be 
capable of specifically binding to a protein or polypep- 25 
tide encoded by EST-related nucleic acids, fragments 
of EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids. In some embod- 
iments, the antibody may be capable of binding an an- 30 
tigenic determinant or an epitope in a protein or polypep- 
tide encoded by EST-related nucleic acids, fragments 
of EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids. 35 
[0373] In other embodiments, the antibodies may be 
capable of specifically binding to an EST-related 
polypeptide, fragment of an EST-related polypeptides 
positional segment of an EST-related polypeptide or 
fragment of a positional segment of an EST-related *o 
polypeptide. In some embodiments, the antibody may 
be capable of binding an antigenic determinant or an 
epitope in an EST-related polypeptide, fragment of an 
EST-related polypeptide, positional segment of an EST- 
related polypeptide or fragment of a positional segment 45 
of an EST-related polypeptide. 
[0374] Antibodies of the present invention may be de- 
scribed or specified in terms of the epitope(s) or portion 
(s) of a polypeptide of the present invention which are 
recognized or specifically bound by the antibody. so 
[0375] In the case of secreted proteins, the antibodies 
may specifically bind a full-length protein encoded by a 
nucleic acid of the present invention, a mature protein 
(i.e. the protein generated by cleavage of the signal pep- 
tide) encoded by a nucleic acid of the present invention, ss 
or a signal peptide encoded by a nucleic acid of the 
present invention. Moreover, the epitope(s) or polypep- 
tide portion (s) may be specified as described herein, e. 



g., by N-terminaJ and C-temninal positions, by size in 
contiguous amino acid residues, or listed in the Tables 
and sequence listing. Antibodies which specifically bind 
any epitope or polypeptide of the present invention may 
also be excluded. Therefore, the present invention in- 
cludes antibodies that specifically bind polypeptides of 
the present invention, and allows forthe exclusion of the 
same. 

[0376] Antibodies of the present invention may also 
be described or specified in terms of their cross-reactiv- 
ity. Antibodies that do not bind any other analog, or- 
tholog, or homolog of the polypeptides of the present 
invention are included. Antibodies that do not bind 
polypeptides with less than 95%, less than 90%, less 
than 85%, less than 80%, less than 75%, less than 70%, 
less than 65%, less than 60%, less than 55%, and less 
than 50% identity (as calculated using methods known 
in the art and described herein) to a polypeptide of the 
present invention are also included in the present inven- 
tion. Further included in the present invention are anti- 
bodies which only bind polypeptides encoded by poly- 
nucleotides which hybridize to a polynucleotide of the 
present invention under stringent hybridization condi- 
tions (as described herein). Antibodies of the present 
invention may also be described or specified in terms of 
their binding affinity. Preferred binding affinities include 
those with a dissociation constant or Kd less than 
5X10-«M, 10-*M, 5X10- 7 M, 10- 7 M, SXIO^M, lO^M, 
SXIO^M, 10-9M, 5X10- 10 M, 10-™M, 5X10" 11 M, 1fr 11 M, 
5X10-12M, 10-12M, 5X10-13M, 10-«M, 5X10"^, 
10- U M, 5X10* 15 M, and 10" 15 M. 
[0377] Antibodies of the present invention have uses 
that include, but are not limited to, methods known in 
the art to purify, detect, and target the polypeptides of 
the present invention including both in vitro and in vivo 
diagnostic and therapeutic methods. For example, the 
antibodies have use in immunoassays for qualitatively 
and quantitatively measuring levels of the polypeptides 
of the present invention in biological samples. See, e. 
g., Harlow et al., ANTIBODIES: A LABORATORY MAN- 
UAL, (Cold Spring Harbor Laboratory Press, 2nd ed, 
1988) (incorporated by reference in the entirety). 
[0378] The antibodies of the present invention may be 
used either alone or in combination with other compo- 
sitions. The antibodies may further be recombinants 
fused to a heterologous polypeptide at the N- or C-ter- 
minus or chemically conjugated (including covalent and 
noncovalent conjugations) to polypeptides or other 
compositions. For example, antibodies of the present in- 
vention may be recombinantly fused or conjugated to 
molecules useful as labels in detection assays and ef- 
fector molecules such as heterologous polypeptides, 
drugs, or toxins. See, e.g., WO 92/08495; WO 
91/14438; WO 89/12624; US Patent 5,31 4,995; and EP 
0 396 387. 

[0379] The antibodies of the present invention may be 
prepared by any suitable method known in the art. For 
example, a polypeptide of the present invention or an 
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antigenic fragment thereof can be administered to an 
animal in order to induce the production of sera contain- 
ing polyclonal antibodies. The term "monoclonal anti- 
body" is not limited to antibodies produced through hy- 
bridoma technology. The term "antibody" refers to a 
polypeptide or group of polypeptides which are com- 
prised of at least one binding domain, where a binding 
domain is formed from the folding of variable domains 
of an antibody molecule to form three-dimensional bind- 
ing spaces with an internal surface shape and charge 
distribution complementary to the features of an anti- 
genic determinant of an antigen, which allows an immu- 
nological reaction with the antigen. The term "mono- 
clonal antibody* refers to an antibody that is derived 
from a single clone, including eukaryotic, prokaryotic, or 
phage clone, and not the method by which it is pro- 
duced. Monoclonal antibodies can be prepared using a 
wide variety of techniques known in the art including the 
use of hybridoma, recombinant and phage display tech- 
nology. 

[0380] Hybridoma techniques include those known in 
the art (See, e.g., Harlow et al., ANTIBODIES: A LAB- 
ORATORY MANUAL, {Cold Spring Harbor Laboratory 
Press, 2nd ed. 1988); Hammeriing, et aJ., in: MONO- 
CLONAL ANTIBODIES AND T-CELL HYBRIDOMAS 
563-681 (Elsevier, N.Y., 1981) (said references incorpo- 
rated by reference in their entireties). Fab and F(ab')2 
fragments may be produced, for example, from hybrid- 
oma-produced antibodies by proteolytic cleavage, using 
enzymes such as papain (to produce Fab fragments) or 
pepsin (to produce F(ab')2 fragments). 
[0381] Alternatively, antibodies of the present inven- 
tion can be produced through the application of recom- 
binant DNA technology or through synthetic chemistry 
using methods known in the art. For example, the anti- 
bodies of the present invention can be prepared using 
various phage display methods known in the art. In 
phage display methods, functional antibody domains 
are displayed on the surface of a phage particle which 
carries polynucleotide sequences encoding them. 
Phage with a desired binding property are selected from 
a repertoire or combinatorial antibody library (e.g. hu- 
man or murine) by selecting directly with antigen, typi- 
cally antigen bound or captured to a solid surface or 
bead. Phage used in these methods arc typically fila- 
mentous phage including fd and M13 with Fab, Fv or 
disulfide stabilized Fv antibody domains recombinantly 
fused to either the phage gene III or gene VIII protein. 
Examples of phage display methods that can be used 
to make the antibodies of the present invention include 
those disclosed in Brinkman U. et al. (1 995) J. Immunol. 
Methods 182:41-50; Ames, R.S. et al. (1995) J. Immu- 
nol. Methods 184:177-186; Kettieborough, C.A. et al. 
(1994) Eur. J. Immunol. 24:952-958; Persic, L. et al. 
(1997) Gene 187 9-18; Burton, D.R. et al. (1994) Ad- 
vances in Immunology 57:191-280; PCT/GB91/01134; 
WO 90/02809; WO 91/10737; WO 92/01047; WO 
92/18619; WO 93/11236; WO 95/15982; WO 95/20401 ■ 



and US Patents 5,698,426, 5,223,409, 5,403,484, 
5,580,717, 5,427,908, 5,750,753, 5,821,047, 
5,571 ,698, 5,427,908, 5,51 6,637, 5,780,225, 5,658,727 
and 5,733,743 (said references incorporated by refer- 
5 ence in their entireties). 

[0382] As described in the above references, after 
phage selection, the antibody coding regions from the 
phage can be isolated and used to generate whole an- 
tibodies, including human antibodies, or any other de- 
io sired antigen binding fragment, and expressed in any 
desired host including mammalian cells, insect cells, 
plant cells, yeast, and bacteria. For example, tech- 
niques to recombinantly produce Fab, Fab* F(ab)2 and 
Ffab 1 ^ fragments can also be employed using methods 
ts known in the art such as those disclosed in WO 
92/22324; Mullinax, R.L. etal. (1992) BioTechniques 12 
(6):864-869; and Sawai, H. et al. (1995) AJRI 34:26-34; 
and Better, M. et al. (1988) Science 240:1041-1043 
(said references incorporated by reference in their en- 
20 tireties). 

[0383] Examples of techniques which can be used to 
produce single-chain Fvs and antibodies include those 
described in U S. Patents 4,946,778 and 5,258,498; 
Huston etal. (1991) Methods in Enzymology 203:46-88; 
25 Shu, L. et al. (1993) PNAS 90:7995-7999; and Skerrai 
A. etal. (1988)Science240:1038-1040. Forsomeuses,' 
including in wvouse of antibodies in humans and in vitro 
detection assays, it may be preferable to use chimeric, 
humanized, or human antibodies. Methods for produc- 
30 ing chimeric antibodies are known in the art. See e.g., 
Morrison, Science 229:1202 (1985); Oi et al., BioTech- 
niques 4:214 (1986); Gillies, S.D. et al. (1989) J. Immu- 
nol. Methods 125:191-202; and US Patent 5,807,715. 
Antibodies can be humanized using a variety of tech- 
*5 niques including CDR-grafting (EP 0 239 400; WO 
91/09967; US Patent 5,530,101; and 5,5B5,089), ve- 
neering or resurfacing (EP 0 592 106; EP 0 519 596; 
Padlan EA., (1991) Molecular Immunology 28(4/5): 
489-498; StudnickaG.M. etal. (1994) Protein Engineer- 
10 ing 7(6):805-81 4; Roguska M.A. et al. (1 994) PNAS 91 : 
969-973), and chain shuffling (US Patent 5,565,332). 
Human antibodies can be made by a variety of methods 
known in the art including phage display methods de- 
scribed above. See also, US Patents 4,444,887, 
5 4,716,111, 5,545,806, and 5,814,318; WO 98/46645; 
WO 98/50433; WO 98/24893; WO 96/34096; WO 
96/33735; and WO 91/10741 (said references inciorpo- 
rated by reference in their entireties). 
[0384] Further included in the present invention are 
> antibodies recombinantly fused or chemically conjugat- 
ed (including both covalently and non-covalently conju- 
gations) to a polypeptide of the present invention. The 
antibodies may be specific for antigens other than 
polypeptides of the present invention. For example, an- 
; tibodies may be used to target the polypeptides of the 
present invention to particular cell types, either in vitro 
or in vivo, by fusing or conjugating the polypeptides of 
the present invention to antibodies specific for particular 
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cell surface receptors. Antibodies fused or conjugated 
to the polypeptides of the present invention may also be 
used in in vitro immunoassays and purification methods 
using methods known in the art See e.g., Harbor et al. 
supra and WO 93/21232; EP 0 439 095; Naramura, M. 
et al. (1994) Immunol. Lett. 39:91-99; US Patent 
5,474,981; Gillies, S.O. et al. (1992) PNAS 89: 
1428-1432; Fell, H.P. et al. (1991) J. Immunol. 146: 
2446-2452 (said references incorporated by reference 
in their entireties). 

[0385] The present invention further includes compo- 
sitions comprising the polypeptides of the present inven- 
tion fused or conjugated to antibody domains other than 
the variable regions. For example, the polypeptides of 
the present invention may be fused or conjugated to an 
antibody Fc region, or portion thereof. The antibody por- 
tion fused to a polypeptide of the present invention may 
comprise the hinge region, CH1 domain, CH2 domain, 
and CH3 domain or any combination of whole domains 
or portions thereof. The polypeptides of the present in- 
vention may be fused or conjugated to the above anti- 
body portions to increase the in vivo half life of the 
polypeptides orf or use in immunoassays using methods 
known in the art. The polypeptides may also be fused 
or conjugated to the above antibody portions to form 
muttimers. For example, Fc portions fused to the 
polypeptides of the present invention can form dinners 
through disulfide bonding between the Fc portions. 
Higher muttimeric forms can be made by fusing the 
polypeptides to portions of IgA and IgM. Methods for fus- 
ing or conjugating the polypeptides of the present inven- 
tion to antibody portions are known in the art. See e.g., 
US Patents 5,336,603, 5,622,929, 5,359,046, 
5,349,053, 5,447,851 , 5,112,946; EP 0 307 434, EP 0 
367 166; WO 96/04388, WO 91/06570; Ashkenazi, A. 
et al. (1991) PNAS 88:10535-10539; Zheng, X.X. et al. 
(1995) J. Immunol. 154:5590-5600; and VII, H. et af. 
(1992) PNAS 89:1 1337-1 1341 (said references incorpo- 
rated by reference in their entireties). 
[0386] The invention further relates to antibodies 
which act as agonists or antagonists of the polypeptides 
of the present invention. For example, the present in- 
vention includes antibodies which disrupt the receptor/ 
iigand interactions with the polypeptides of the invention 
either partially or fully. Included are both receptor-spe- 
cific antibodies and ligand-specific antibodies. Included 
are receptor-specific antibodies which do not prevent 
Iigand binding but prevent receptor activation. Receptor 
activation (i.e., signaling) may be determined by tech- 
niques described herein or otherwise known in the art. 
Also include are receptor-specific antibodies which both 
prevent iigand binding and receptor activation. Like- 
wise, included are neutralizing antibodies wh ich bind the 
Iigand and prevent binding of the Iigand to the receptor, 
as well as antibodies which bind the Iigand, thereby pre- 
venting receptor activation, but do not prevent the Iigand 
from binding the receptor. Further included are antibod- 
ies which activate the receptor. These antibodies may 



act as agonists for either all or less than all of the bio- 
logical activities affected by ligand-mediated receptor 
activation. The antibodies may be specified as agonists 
or antagonists for biological activities comprising specif- 

5 ic activities disclosed herein. The above antibody ago- 
nists can be made using methods known in the art. See 
e.g., WO 96/40281; US Patent 5,811,097; Deng, B. et 
al. (1998) Blood 92(6):1 981 -1988; Chen, Z.etal. (1998) 
Cancer Res. 58(16):36683678; Harrop, J. A. et al. 

10 (1998) J. Immunol. 1 61 (4):1 786-1 794; Zhu, Z. et al. 
(1 998) Cancer Res. 58(1 5):3209-3214; Yoon, D.Y. et al. 
(1998) J. Immunol. 160(7):31 70-31 79; Prat, M. et al. 
(1998) J. Cell. Sci. 111(Pt2):237-247; Pitard, V. et al. 
(1997) J. Immunol. Methods 205(2):177-190; Liautard, 

is J. et al. (1997) Cytokinde 9(4)233-241 ; Carlson, N.G. 
et al. (1 997) J. Biol. Chem. 272(1 7): 1 1 295-1 1 301 ; Tary- 
man, R.E. et al. (1995) Neuron 14(4):755-762; Muller, 
Y.A. etal. (1998) Structure 6(9): 11 53-1 167; Bartunek, P. 
et al. (1996) Cytokine 8(1): 14-20 (said references incor- 

20 porated by reference in their entireties). 

[0387] As discussed above, antfoodies of the 
polypeptides of the invention can, in turn, be utilized to 
generate anti-idiotypic antibodies that "mimic" polypep- 
tides of the invention using techniques well known to 

25 those skilled in the art. See, e.g. Greenspan and Bona, 
FASEB J. 7(5):437-444 (1989); Nissinoff, J. Immunol. 
147(8):2429-2438 (1991). For example, antibodies 
which bind to and competitively inhibit polypeptide mul- 
timerization or binding of a polypeptide of the invention 

30 to Iigand can be used to generate anti-idiotypes that 
"mimic" the polypeptide multimerization or binding do- 
main and , as a consequence, bind to and neutralize 
polypeptide or its Iigand. Such neutralization anti-idio- 
typic antibodies can be used to bind a polypeptide of the 

35 invention or to bind its ligands/receptors, and therby 
block its biological activity, 

EXAMPLE 33 

40 Production of an Antibody to a Human Polypeptide or 
Protein 

[0388] The above described EST-related nucleic ac- 
ids, fragments of EST-related nucleic acids, positional 

45 segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids or nu- 
cleic acids encoding EST-related polypeptides, frag- 
ments of EST-related polypeptides, positional segments 
of EST-related polypeptides or fragments of positional 

so segments of EST-related polypeptides are operably 
linked to promoters and introduced into ceils as de- 
scribed above. 

[0389] In the case of secreted proteins, nucleic acids 
encoding the full protein (i.e. the mature protein and the 
55 signal peptide), nucleic acids encoding the mature pro- 
tein (i.e. the protein generated by cleavage of the signal 
peptide), or nucleic acids encoding the signal peptide 
are operably linked to promoters and introduced into 
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ceils as described above. 

[0390] The encoded proteins or polypeptides are then 
substantially purified or isolated as described above. 
The concentration of protein in the final preparation is 
adjusted, for example, by concentration on an Am icon 
filter device, to the level of a few ug/ml. Monoclonal or 
polyclonal antibody to the protein or polypeptide can 
then be prepared as follows : 

1 . Monoclonal Antibody Production by Hybridoma 
Fusion 

[0391] Monoclonal antibody to epitopes of any of the 
proteins or polypeptides identified and isolated as de- 
scribed can be prepared from murine hybridomas ac- 
cording to the classical method of Kohler, and Milstein, 
Nature 256.495 (1975) or derivative methods thereof. 
Briefly, a mouse is repetitively, inoculated with a few mi- 
crograms of the selected protein or peptides derived 
therefrom over a period of a few weeks. The mouse is 
then sacrificed, and the antibody producing cells of the 
spleen isolated. The spleen cells are fused by means of 
polyethylene glycol with mouse myeloma cells, and the 
excess unfused cells destroyed by growth of the system 
on selective media comprising aminopterin (HAT me- 
dia). The successfully fused cells are diluted and aliq- 
uots of the dilution placed in wells of a microtiter plate 
where growth of the culture is continued. Antibody-pro- 
ducing clones are identified by detection of antibody in 
the supernatant fluid of the wells by immunoassay pro- 
cedures, such as Elisa, as originally described by 
Engvall, Meth. Enzymot. 70:419 (1980), the disclosure 
of which is incorporated herein by reference and deriv- 
ative methods thereof. Selected positive clones can be 
expanded and their monoclonal antibody product har- 
vested for use. Detailed procedures for monoclonal an- 
tibody production are described in Davis, L. et al. in Ba- 
sic Methods in Molecular Biology Elsevier, New York. 
Section 21-2, the disclosure of which is incorporated 
herein by reference. 

2. Polyclonal Antibody Production by immunization 

[0392] Polyclonal antiserum containing antibodies to 
heterogenous epitopes of a single protein or polypeptide 
can be prepared by immunizing suitable animals with 
the expressed protein or peptides derived therefrom, 
which can be unmodified or modified to enhance immu- 
nogenicfly. Effective polyclonal antibody production is 
affected by many factors related both to the antigen and 
the host species. For example, small molecules tend to 
be less immunogenic than others and may require the 
use of carriers and adjuvant. Also, host animals re- 
sponse vary depending on site of inoculations and dos- 
es, with both inadequate or excessive doses of antigen i 
resulting in low titer antisera. Small doses (ng level) of 
antigen administered at multiple intradermal sites ap- 
pears to be most reliable. An effective immunization pro- 



tocol for rabbits can be found in Vaitukaitis. etalJ. Clin. 
Endocrinol. Metab. 33:988-991 (1 971 ), the disclosure of 
which is incorporated herein by reference. 
[0393] Booster injections can be given at regular in- 
5 tervals, and antiserum harvested when antibody titer 
thereof, as determined semhquantitatrvely, for example, 
by double immunodiffusion in agar against known con- 
centrations of the antigen, begins to fall. See, for exam- 
ple, Ouchteriony, etal. t Chap. 19 in: Handbook of Ex- 
10 perimental Immunology D. Wier (ed) Blackwell (1 973) , 
the disclosure of which is incorporated herein by refer- 
ence. Plateau concentration of antibody is usually in the 
range of 0. 1 to 0.2 mg/ml of serum (about 1 2 u.M) . Affinity 
of the antiserafor the antigen is determined by preparing 
is competitive binding curves, as described, for example, 
by Fisher, D., Chap. 42 in: Manual of Clinical Immunol- 
ogy, 2d Ed. (Rose and Friedman, Eds.) Amer. Soc. For 
Microbiol., Washington, D.C. (1980), the disclosure of 
which is incorporated herein by reference. 
20 [Q394] Antibody preparations prepared according to 
either of the above protocols are useful in a variety of 
contexts. In particular, the antibodies may be used in 
immunoaffinity chromatography techniques such as 
those described below to facilitate large scale isolation, 
25 purification, or enrichment of the proteins or polypep- 
tides encoded by EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids or for 
the isolation, purification or enrichment of EST-related 
30 polypeptides, fragments of EST-related polypeptides, 
positional segments of EST-related polypeptides or 
fragments of positional segments of EST-related 
polypeptides. 

[0395] In the case of secreted proteins, the antibodies 
35 may be usedforthe isolation, purification, or enrichment 
of the full protein (i.e. the mature protein and the signal 
peptide), the mature protein (i.e. the protein generated 
by cleavage of the signal peptide), or the signal peptide 
are operably linked to promoters and introduced into 
40 cells as described above. 

[0396] Additionally, the antibodies may be used in im- 
munoaffinity chromatography techniques such as those 
described below to isolate, purify, or enrich polypeptides 
which have been linked to the proteins or polypeptides 
*s encoded by EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids orto iso- 
late, purify, or enrich EST-related polypeptides, frag- 
ments of EST-related polypeptides, positional segments 
of EST-related polypeptides or fragments of positional 
segments of EST-related polypeptides. 
[0397] The antibodies may also be used to determine 
the cellular localization of polypeptides encoded by the 
proteins or polypeptides encoded by EST-related nucle- 
ic ic acids, positional segments of EST-related nucleic ac- 
ids or fragments of positional segments of EST-related 
nucleic acids or the cellular localization of EST-related 
polypeptides, fragments of EST-related polypeptides, 
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positional segments of EST-related polypeptides or 
fragments of positional segments of EST-related 
polypeptides. 

[0398] In addition, the antibodies may also be used to 
determine the cellular localization of polypeptides which s 
have been linked to the proteins or polypeptides encod- 
ed by EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids or polypeptides 
which have been linked EST-related polypeptides, frag- to 
ments of EST-related polypeptides, positional segments 
of EST-related polypeptides or fragments of positional 
segments of EST-related polypeptides. 
[0399] The antibodies may also be used in quantita- 
tive immunoassays which determine concentrations of ts 
antigen-bearing substances in biological samples; they 
may also used semi-quantitativefy or qualitatively to 
identify the presence of antigen in a biological sample 
or to identity the type of tissue present in a biological 
sample. The antibodies may also be used in therapeutic 20 
compositions for killing cells expressing the protein or 
reducing the levels of the protein in the body. 

VI. Use of 5'ESTs and Consensus Contigated g 

ESTs or Sequences Obtainable Therefrom or 25 

Portions Thereof as Reagents 

[0400] The EST-related nucleic acids, positional seg- 
ments of EST-related nucleic adds or fragments of po- 
sitional segments of EST-related nucleic acids may be 30 
used as reagents in isolation procedures, diagnostic as- 
says, and forensic procedures. For example, sequenc- 
es from the EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids, may be 3s 
detectably, labeled and used as probes to isolate other 
sequences capable of hybridizing to them. In addition, 
the he EST-related nucleic acids, positional segments 
of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids may be used to 40 
design PCR primers to be used in isolation, diagnostic, 
or forensic procedures. 

1 . Use of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of *s 
positional segments of EST-related nucleic acids in 
isolation, diagnostic and forensic procedures 

EXAMPLE 34 

50 

Preparation of PCR Primers and Amplification of DNA 

[0401] The EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may be 55 
used to prepare PCR primers for a variety of applica- 
tions, including isolation procedures for cloning nucleic 
acids capable of hybridizing to such sequences, diag- 



nostic techniques and forensic techniques. In some em- 
bodiments, the PCR primers at least 1 0, 1 5, 1 8, 20, 23, 
25, 26, 30, 40, or 50 nucleotides In length. In some em- 
bodiments, the PCR primers may be more than 30 bas- 
es In length. It is preferred that the primer pairs have 
approximately the same G/C ratio, so that melting tem- 
peratures are approximately the same. A variety of PCR 
techniques are familiar to those skilled in the art. For a 
review of PCR technology, see Molecular Cloning to Ge- 
netic Engineering White, B. A. Ed. in Methods in Molec- 
ular Biology 67: Humana Press, Totowa 1997, the dis- 
closure of which is incorporated herein by reference. In 
each of these PCR procedures, PCR primers on either 
side of the nucleic acid sequences to be amplified are 
added to a suitably prepared nucleic acid sample along 
with dNTPs and a thermostable polymerase such as Taq 
polymerase, Pfu polymerase, or Vent polymerase. The 
nucleic acid in the sample is denatured and the PCR 
primers are specifically hybridized to complementary 
nucleic add sequences in the sample. The hybridized 
primers are extended. Thereafter, another cyde of de- 
naturation, hybridization, and extension is initiated. The 
cycles are repeated multiple times to produce an ampli- 
fied fragment containing the nucleic add sequence be- 
tween the primer sites. 

EXAMPLE 35 

Use of the EST-related nudeic acids, positional 
segments of EST-related nucleic adds or fragments of 
positional segments of EST-related nudeic adds as 
probes 

[0402] Probes derived from EST-related nudeic ac- 
ids, positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nudeic 
adds may be labeled with detectable labels familiar to 
those skilled in the art, including radioisotopes and non- 
radioactive labels, to provide a detectable probe. The 
detectable probe may be single stranded or double 
stranded and may be made using techniques known in 
the art, including in vitro transcnption, nick translation, 
or kinase reactions. A nucleic add sample containing a 
sequence capable of hybridizing to the labeled probe is 
contacted with the labeled probe. If the nucleic acid in 
the sample is double stranded, it may be denatured prior 
to contacting the probe. In some applications, the nu- 
cleic acid sample may be immobilized on a surface such 
as a nitrocellulose or nylon membrane. The nucleic acid 
sample may comprise nudeic acids obtained from a va- 
riety of sources, induding genomic DNA, cDNA librar- 
ies, RNA, or tissue samples. 

[0403] Procedures used to detect the presence of nu- 
cleic acids capable of hybridizing to the detectable 
probe indude well known techniques such as Southern 
blotting, Northern blotting, dot blotting, colony hybridi- 
zation, and plaque hybridization. In some applications, 
the nucleic acid capable of hybridizing to the labeled 
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probe may be cloned into vectors such as expression 
vectors, sequencing vectors, or in vitro transcription 
vectors to facilitate the characterization and expression 
of the hybridizing nucleic acids in the sample. For ex- 
ample, such techniques may be used to isolate and 
clone sequences in a genomic library or cDNA library 
which are capable of hybridizing to the detectable probe 
as described in Example 18 above. 
[0404] PGR primers made as described in Example 
34 above may be used in forensic analyses, such as the 
DNA fingerprinting techniques described in Examples 
36-40 below. Such analyses may utilize detectable 
probes or primers based on the sequences of the EST- 
related nucleic acids, positional segments of EST-relat- 
ed nucleic acids or fragments of positional segments of 
EST-related nucleic acids. 

EXAMPLE 36 

Forensic Matching by DNA Sequencing 

[0405] in one exemplary method, DNA samples are 
isolated from forensic specimens of, for example, hair, 
semen, blood or skin cells by conventional methods. A 
panel of PCR primers based on a number of the EST- 
related nucleic acids, positional segments of EST-relat- 
ed nucleic acids or fragments of positional segments of 
EST-related nucleic acids is then utilized in accordance 
with Example 34 to amplify DNA of approximately 
100-200 bases in length from the forensic specimen. 
Corresponding sequences are obtained from a test sub- 
ject. Each of these identification DNAs is then se- 
quenced using standard techniques, and a simple da- 
tabase comparison determines the differences, if any, 
between the sequences from the subject and those from 
the sample. Statistically significant differences between 
the suspect's DNA sequences and those from the sam- 
ple conclusively prove a lack of identity. This lack of 
identity can be proven, for example, with only one se- 
quence. Identity, on the other hand, should be demon- 
strated with a large number of sequences, all matching. 
Preferably, a minimum of 50 statistically identical se- 
quences of 100 bases in length are used to prove iden- 
tity between the suspect and the sample. 

EXAMPLE 37 

Positive Identification by DNA Sequencing 

[0406] The technique outlined in the previous exam- 
ple may also be used on a larger scale to provide a 
unique fingerprint-type identification of any individual. In 
this technique, primers are prepared from a large 
number of EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids. Prefer- 
ably, 20 to 50 different primers are used. These primers 
are used to obtain a corresponding number of PCR-gen- 



erated DNA segments from the individual in question in 
accordance with Example 34. Each of these DNA seg- 
ments is sequenced, using the methods set forth in Ex- 
ample 36. The database of sequences generated 
5 through this procedure uniquely identifies the individual 
from whom the sequences were obtained. The same 
panel of primers may then be used at any later time to 
absolutely correlate tissue or other biological specimen 
with that individual. 

10 

EXAMPLE 38 

Southern Blot Forensic Identification 

is [0407] The procedure of Example 37 is repeated to 
obtain a panel of at least 10 amplified sequences from 
an individual and a specimen. Preferably, the panel con- 
tains at least 50 amplified sequences. More preferably, 
the panel contains 100 amplified sequences. In some 

20 embodiments, the panel contains 200 amplified se- 
quences. This PCR-generated DNA is then digested 
with one or a combination of, preferably, four base spe- 
cific restriction enzymes. Such enzymes are commer- 
cially available and known to those of skill in the art. After 

25 digestion, the resultant gene fragments are size sepa- 
rated in multiple duplicate wells on an agarose gel and 
transferred to nitrocellulose using Southern blotting 
techniques well known to those with skill in the art. For 
a review of Southern blotting see Davis et a!. (Basic 

30 Methods in Molecular Biology, 1 986, Elsevier Press, pp 
62-65), the disclosure of which is incorporated herein by 
reference. 

[0408] A panel of probes based on the sequences of 
the EST-related nucleic acids, positional segments of 
35 EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids are radioactively 
or colorimetricalry labeled using methods known in the 
art, such as nick translation or end labeling, and hybrid- 
ized to the Southern blot using techniques known in the 
40 art (Davis et a/., supra). Preferably, the probe is at least 
10, 12, 15, 18, 20, 25, 28, 30, 35, 40, 50, 75, 100, 150, 
200, 300, 400 or 500 nucleotides in length. Preferably, 
the probes are at least 10, 12, 15, 18, 20, 25, 28, 30, 
35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 nucle- 
us otides in length. In some embodiments, the probes are 
oligonucleotides which are 40 nucleotides in length or 
less. 

[0409] Preferably, at least 5 to 10 of these labeled 
probes are used, and more preferably at least about 20 

so or30areusedtoprovideauniquepattem.The resultant 
bands appearing from the hybridization of a large sam- 
ple of EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids will be a unique 

55 identifier. Since the restriction enzyme cleavage will be 
different for every individual, the band pattern on the 
Southern blot will also be unique. Increasing the number 
of probes will provide a statistically higher level of con- 
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fldence in the identification since there will be an in- 
creased number of sets of bands used for identification. 

EXAMPLE 39 

5 

Dot Blot Identification Procedure 

[0410] Another technique for identifying individuals 
using the EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- n> 
sitional segments of EST-related nucleic acids dis- 
closed herein utilizes a dot blot hybridization technique. 
[041 1 ] Genomic DNA is isolated from nuclei of subject 
to be identified. Probes are prepared that correspond to 
at least 10, preferably 50 sequences from the EST- re- 15 
lated nucleic acids, positional segments of EST-related 
nucleic acids or fragments of positional segments of 
EST-related nucleic acids. The probes are used to hy- 
bridize to the genomic DNA through conditions known 
to those in the art The oligonucleotides are end labeled 20 
with P 32 using polynucleotide kinase (Pharmacia). Dot 
Blots are created by spotting the genomic DNA onto ni- 
trocellulose or the like using a vacuum dot blot manifold 
(BioRad, Richmond California). The nitrocellulose fitter 
containing the genomic sequences is baked or UV & 
linked to the filter, prehybridized and hybridized with la- 
beled probe using techniques known in the art (Davis et 
at., supra). The labeled DNA fragments are sequen- 
tially hybridized with successively stringent conditions 
to detect minimal differences between the 30 bp se- 30 
quence and the DNA. Tetramethylammonium chloride 
is useful for identifying clones containing small numbers 
of n ucleotide mismatches (Wood era/., Proc. Nail. Acad. 
Set. USA 82(6):15B5-1588 (1985)) which is hereby in- 
corporated by reference. A unique pattern of dots dis- 35 
tingutshes one individual from another individual. 
[0412] EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids can be 
used as probes in the following alternative fingerprinting *o 
technique. In some embodiments, the probes are oligo- 
nucleotides which are 40 nucleotides in length or less. 
[0413] Preferably, a plurality of probes having se- 
quences from different EST-related nucleic acids, posi- 
tional segments of EST-related nucleic acids or frag- *5 
ments of positional segments of EST-related nucleic ac- 
ids are used in the alternative fingerprinting technique. 
Example 40 below provides a representative alternative 
fingerprinting procedure in which the probes are derived 
from EST-related nucleic acids, positional segments of so 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids. 

EXAMPLE 40 

55 

Alternative "Fingerprint" Identification Technique 
[0414] Oligonucleotides are prepared from a large 



number, e.g. 50, 100, or 200, EST-related nucleic acids, 
positional segments of EST-related nucleic acids or frag- 
ments of positional segments of EST-related nucleic acids 
using commercially avaflable oligonucleotide services 
such as Genset, Paris, France. Preferably, the oligonucle- 
otides are at least 10, 15, 18, 20, 23, 25 28, or 30 nucle- 
otides in length. However, In some embodiments, the oli- 
gonucleotides may be more than 30 nucleotides in length. 
[0415] Cell samples from the test subject are proc- 
essed for DNA using techniques well known to those 
with skill in the art. The nucleic acid is digested with re- 
striction enzymes such as EcoRI and Xbal. Following 
digestion, samples are applied to wells for electrophore- 
sis. The procedure, as known in the art, may be modified 
to accommodate poly acryl amide electrophoresis, how- 
ever in this example, samples containing 5 ug of DNA 
are loaded into wells and separated on 0.8% agarose 
gels. The gels are transferred onto nitrocellulose using 
standard Southern blotting techniques. 
[0416] 10 ng of each of the oligonucleotides are 
pooled and end-labeled with P 32 . The nitrocellulose is 
prehybridized with blocking solution and hybridized with 
the labeled probes. Following hybridization and wash- 
ing, the nitrocellulose filter is exposed to X-Omat AR X- 
ray film. The resulting hybridization pattern will be 
unique for each individual. 

[0417] It is additionally contemplated within this ex- 
ample that the number of probe sequences used can be 
varied for additional accuracy or clarity. 
[041 8] in addition to their applications in forensics and 
identification, EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may be 
mapped to their chromosomal locations. Example 41 
below describes radiation hybrid (RH) mapping of hu- 
man chromosomal regions using EST-related nucleic 
acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nu- 
cleic acids. Example 42 below describes a representa- 
tive procedure for mapping EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids to their locations on human chromosomes. Exam- 
ple 43 below describes mapping of EST-related nucleic 
acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nu- 
cleic acids on metaphase chromosomes by Fluores- 
cence In Situ Hybridization (FISH) 

2. Use of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids in 
Chro- mosome Mapping 

EXAMPLE 41 

Radiation hybrid mapping of EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
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fragments of positional segments of EST-related nucleic 
acids to the human genome 

[0419] Radiation hybrid (RH) mapping is a somatic 
cell genetic approach that can be used for high resotu- s 
tion mapping of the human genome. In this approach, 
cell lines containing one or more human chromosomes 
are lethally irradiated, breaking each chromosome into 
fragments whose size depends on the radiation dose. 
These fragments are rescued by fusion with cultured ro- io 
dent cells, yielding subclones containing different por- 
tions of the human genome. This technique is described 
by Benham etal. (Genomics 4:509-51 7, 1989) and Cox 
era/., (Science 250545-250, 1990), the entire contents 
of which are hereby incorporated by reference. The ran- is 
dom and independent nature of the subclones permits 
efficient mapping of any human genome marker Human 
DNA isolated from a panel of 80-100 cell lines provides 
a mapping reagent for ordering EST-related nucleic ac- 
ids, positional segments of EST-related nucleic acids or 20 
fragments of positional segments of EST-related nucleic 
acids. In this approach, the frequency of breakage be- 
tween markers is used to measure distance, allowing 
construction of fine resolution maps as has been done 
using conventional ESTs (Schuler et al., Science 274: 25 
540-546, 1996, hereby incorporated by reference). 
[0420] RH mapping has been used to generate a 
high-resolution whole genome radiation hybrid map of 
human chromosome 17q22-q25.3 across the genes for 
growth hormone (GH) and thymidine kinase (TK) (Fos- 3o 
ter ef a/., Genomics 33:185-1 92, 1 996), the region sur- 
rounding the Goriin syndrome gene (Obermayr et al, 
Eur. J. Hum. Genet 4:242-245, 1996), 60 loci covering 
the entire short arm of chromosome 12 (Raeymaekers 
ef at., Genomics 29:170-178, 1995), the region of hu- 35 
man chromosome 22 containing the neurofibromatosis 
type 2 locus (Frazerefa/., Genomics 1 4:574-584, 1992) 
and 13 loci on the long arm of chromosome 5 (War- 
rington et al., Genomics 1 1 :701 -708, 1 991 ). 

40 

EXAMPLE 42 

Mapping of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids to 45 
Human Chromosomes using PCR techniques 

[0421] EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may be so 
assigned to human chromosomes using PCR based 
methodologies. In such approaches, oligonucleotide 
primer pairs are designed from EST-related nucleic ac- 
ids, positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 55 
acids to minimize the chance of amplifying through an 
intron. Preferably, the oligonucleotide primers are 1 8-23 
bp in length and are designed for PCR amplification. The 



creation of PCR primers from known sequences is well 
known to those with skill in the art. For a review of PCR 
technology see Eriich. in PCR Technology; Principles 
and Applications for DNA Amplification. 1992. W.H. 
Freeman and Co., New York, the disclosure of which is 
incorporated herein by reference. 
[0422] The primers are used in polymerase chain re- 
actions (PCR) to amplify templates from total human ge- 
nomic DNA. PCR conditions are as follows: 60 ng of ge- 
nomic DNA is used as a template for PCR with 80 ng of 
each oligonucleotide primer, 0.6 unit of Taq polymerase, 
and 1 u.Cu of a ^P-labeled deoxycytidine triphosphate. 
The PCR is performed in a microplate thermocycler 
(Techne) under the following conditions: 30 cycles of 
94*C f 1 .4 min; 55°C, 2 min; and 72°C, 2 min; with a final 
extension at 72°C for 10 min. The amplified products 
are analyzed on a 6% polyacryiamide sequencing gel 
and visualized by autoradiography. If the length of the 
resulting PCR product is identical to the distance be- 
tween the ends of the primer sequences in the 5' EST 
from which the primers are derived, then the PCR reac- 
tion is repeated with DNA templates from two panels of 
human-rodent somatic cell hybrids, BIOS PCRable 
DNA (BIOS Corporation) and NIGMS Human-Rodent 
Somatic Cell Hybrid Mapping Panel Number 1 (NIGMS, 
Camden, NJ). 

[0423] PCR is used to screen a series of somatic cell 
hybrid cell lines containing defined sets of human chro- 
mosomes for the presence of a given 5'EST. DNA is iso- 
lated from the somatic hybrids and used as starting tem- 
plates for PCR reactions using the primer pairs from the 
EST-related nucleic acids, positional segments of EST- 
related nucleic acids or fragments of positional seg- 
ments of EST-related nucleic acids. Only those somatic 
cell hybrids with chromosomes containing the human 
gene corresponding to the EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids will yield an amplified fragment. The 5'ESTs are 
assigned to a chromosome by analysis of the segrega- 
tion pattern of PCR products from the somatic hybrid 
DNA templates. The single human chromosome 
present in all cell hybrids that give rise to an amplified 
fragment is the chromosome containing that EST-relat- 
ed nucleic acids, positional segments of EST-related nu- 
cleic acids or fragments of positional segments of EST- 
related nucleic acids. For a review of techniques and 
analysis of results from somatic cell gene mapping ex- 
periments. (See Ledbetter et al., Genomics 6:475-481 
(1990)., the disclosure of which is incorporated herein 
by reference.) 

[0424] Alternatively, the EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids may be mapped to individual chromosomes using 
FISH as described in Example 43 below. 
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EXAMPLE 43 

Mapping of EST-related nudeic acids, positional 
segments of EST-related nudeic acids or fragments of 
positional segments of EST-felated nucleic adds to 
Chromosomes Using Fluorescence In Situ 
Hybridization 

[0425] Fluorescence in situ hybridization allows the 
EST-related nudeic acids, positional segments of EST- 
related nudeic adds or fragments of positional seg- 
ments of EST-related nudeic acids to be mapped to a 
particular location on a given chromosome. The chro- 
mosomes to be used for fluorescence in situ hybridiza- 
tion techniques may be obtained from a variety of sourc- 
es induding cell cultures, tissues, or whole blood. 
[0426] In a preferred embodiment, chromosomal lo- 
calization of EST-related nudeic adds, positional seg- 
ments of EST-related nucleic adds or fragments of po- 
sitional segments of EST-related nucleic adds are ob- 
tained by FISH as described by Cherif eta!. (Proc. Natl. 
Acad. Set. U.S.A., 87:6639-6643, 1990), the disdosure 
of which is incorporated herein by reference. Metaphase 
chromosomes are prepared from phytohemagglutinin 
(PHA)-stimulated blood cell donors. PHA-stimulated 
lymphocytes from healthy males are cultured for 72 h in 
RPMI-1 640 medium. Forsynchronization, methotrexate 
(10 u.M) is added for 1 7 h, followed by addition of 5-bro- 
modeoxyuridine (5-BrdU, 0.1 mM) for 6 h. Colcemid (1 
jig/ml) is added for the last 15 min before harvesting the 
ceils. Cells are collected, washed in RPMI, incubated 
with a hypotonic solution of KCI (75 mM) at 37°C for 15 
min and fixed in three changes of methanokacetic acid 
(3:1 ). The cell suspension is dropped onto a glass slide 
and air dried. The EST-related nucleic adds, positional 
segments of EST-related nudeic adds or fragments of 
positional segments of EST-related nucleic adds is la- 
beled with biotin-16 dUTP by nick translation according 
to the manufacturer's instructions (Bethesda Research 
Laboratories, Bethesda, MD), purified using a Sepha- 
dex G-50 column (Pharmada, Upsala, Sweden) and 
precipitated. Just prior to hybridization, the DNA pellet 
is dissolved in hybridization buffer (50% f ormamide, 2 X 
SSC, 10% dextran sulfate, 1 mg/ml sonicated salmon 
sperm DNA, pH 7) and the probe is denatured at 70°C 
for 5-10 min. 

[0427] Slides kept at -20°C are treated for 1 h at 37°C 
with RNase A (1 00 ng/ml), rinsed three times in 2 X SSC 
and dehydrated in an ethanol series. Chromosome 
preparations are denatured in 70% f ormamide, 2 X SSC 
for 2 min at 70 Q C, then dehydrated at 4°C. The slides 
are treated with proteinase K (10 ug/100 ml in 20 mM 
Tris-HCI, 2 mM CaCI 2 ) at 37°C for 8 min and dehydrat- 
ed. The hybridization mixture containing the probe is 
placed on the slide, covered with a coverslip, sealed with 
rubber cement and incubated overnight in a humid 
chamber at 37°C. After hybridization and post-hybridi- 
zation washes, the biotinylated probe is detected by avi- 



din-FITC and amplified with additional layers of bioti- 
nylated goat anti-avidin and avidin-FITC. For chromo- 
somal localization, fluorescent R -bands are obtained as 
previously described (Cherif et a/., supra.). The slides 

5 are observed under a LEICA fluorescence microscope 
(DMRXA). Chromosomes are counterstained with pro- 
pidium iodide and the fluorescent signal of the probe ap- 
pears as two symmetrical yellow-green spots on both 
chromatids of the fluorescent R-band chromosome 

to (red). Thus, a particular EST-related nudeic adds, po- 
sitional segments of EST-related nudeic adds or frag- 
ments of positional segments of EST-related nudeic ac- 
ids may be localized to a particular cytogenetic R-band 
on a given chromosome. 

15 [0428] Once the EST-related nudeic adds, positional 
segments of EST-related nudeic adds or fragments of 
positional segments of EST-related nudeic adds have 
been assigned to particular chromosomes using the 
techniques described in Examples 41-43 above, they 

20 may be utilized to construct a high resolution map of the 
chromosomes on which they are located or to Identify 
the chromosomes in a sample. 

EXAMPLE 44 

25 

Use of EST-related nudeic acids, positional segments 
of EST-related nucleic adds or fragments of positional 
segments of EST-related nudeic adds to Construct or 
Expand Chromosome Maps 

30 

[0429] Chromosome mapping involves assigning a 
given unique sequence to a particular chromosome as 
described above. Once the unique sequence has been 
mapped to a given chromosome, it is ordered relative to 

35 other unique sequences located on the same chromo- 
some. One approach to chromosome mapping utilizes 
a senes of yeast artifidal chromosomes (YACs) bearing 
several thousand long inserts derived from the chromo- 
somes of the organism from which the EST-related nu- 

40 oleic acids, positional segments of EST-related nucleic 
adds or fragments of positional segments of EST-relat- 
ed nucleic adds are obtained. This approach is de- 
scribed in Ramaiah Nagaraja etal., Genome Research 
7:210-222, March 1997, the disdosure of which is incor- 

45 porated herein by reference. Briefly, in this approach 
each chromosome is broken into overlapping pieces 
which are inserted into the YAC vector. The YAC inserts 
are screened using PCR or other methods to determine 
whether they include the EST-related nucleic acids, po- 

so sitional segments of EST-related nucleic acids or frag- 
ments of positional segments of EST-related nucleic ac- 
ids whose position is to be determined. Once an insert 
has been found which includes the 5' EST, the insert can 
be analyzed by PCR or other methods to determine 

55 whether the insert also contains other sequences known 
to be on the chromosome or in the region from which 
the EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
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segments of EST-related nucleic acids was derived. 
This process can be repeated for each insert in the YAC 
library to determine the location of each of the EST-re- 
lated nucleic acids, positional segments of EST-related 
nucleic acids or fragments of positional segments of 5 
EST-related nucleic acids relative to one another and to 
other known chromosomal markers. In this way, a high 
resolution map of the distribution of numerous unique 
markers along each of the organisms chromosomes 
may be obtained. 10 
[0430] As described in Example 45 below EST-relat- 
ed nucleic acids, positional segments of EST-related nu- 
cleic acids or fragments of positional segments of EST- 
related nucleic acids may also be used to identify genes 
associated with a particular phenotype, such as hered- is 
itary disease or drug response. 

3. Use of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids Gene 20 
Identification 



EXAMPLE 45 

Identification of genes associated with hereditary 
diseases or drug response 

[0431 J This example illustrates an approach useful for 
the association of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids with 
particular phenotypic characteristics. In this example, a 
particular EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids is used 
as a test probe to associate that EST-related nucleic ac- 
ids, positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids with a particular phenotypic characteristic. 
[0432] EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids are 
mapped to a particular location on a human chromo- 
some using techniques such as those described in Ex- 
amples 41 and 42 or other techniques known in the art. 
A search of Mendelian Inheritance in Man (V. McKusick, 
Mendelian inheritance in Man (available on line through 
Johns Hopkins University Welch Medical Library) re- 
veals the region of the human chromosome which con- 
tains the EST-related nucleic acids, positional segments 
of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids to be a very gene 
rich region containing several known genes and severai 
diseases or phenotypes for which genes have not been 
identified. The gene corresponding to this EST-related 
nucleic acids, positional segments of EST-related nu- 
cleic acids or fragments of positional segments of EST- 
related nucleic acids thus becomes an immediate can- 



didate for each of these genetic diseases. 
[0433] Cells from patients with these diseases or phe- 
notypes are isolated and expanded in culture. PCR 
primers from the EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids are 
used to screen genomic DNA, mRN A or cDNA obtained 
from the patients. EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids that 
are not amplified in the patients can be positively asso- 
ciated with a particular disease by further analysis. Al- 
ternatively, the PCR analysis may yield fragments of dif- 
ferent lengths when the samples are derived from an 
individual having the phenotype associated with the dis- 
ease than when the sample is derived from a healthy 
individual, indicating that the gene containing the EST- 
related nucleic acids, positional segments of EST-relat- 
ed nucleic acids or fragments of positional segments of 
EST-related nucleic acids may be responsible for the 
genetic disease. 



VII. Use of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments 
25 of positional segments of EST-related nucjejc acids 
to Construct Vectors and Uses Thereof 

[0434] The present EST-related nucleic acids, posi- 
tional segments of EST-related nucleic acids or frag- 

30 ments of positional segments of EST-related n ucleic ac- 
ids may also be used to construct secretion vectors ca- 
pable of directing the secretion of the proteins encoded 
by genes therein. Such secretion vectors may facilitate 
the purification or enrichment of the proteins encoded 

35 by genes inserted therein by reducing the number of 
background proteins from which the desired protein 
must be purified or enriched. Exemplary secretion vec- 
tors are described in Example 46 below. 

40 1 . Construction of Vectors and Uses Thereof 

EXAMPLE 46 



Construction of Secretion Vectors 



45 



[0435] The secretion vectors of the present invention 
include a promotercapable of directing gene expression 
in the host cell, tissue, or organism of interest. Such pro- 
moters include the Rous Sarcoma Virus promoter, the 

so SV40 promoter, the human cytomegalovirus promoter, 
and other promoters familiar to those skilled in the art. 
[0436] A signal sequence from one of the EST-related 
nucleic acids, positional segments of EST-related nu- 
cleic acids or fragments of positional segments of EST- 

55 related nucleic acids is operably linked to the promoter 
such that the mRNA transcribed from the promoter will 
direct the translation of the signal peptide. Preferably, 
the signal sequence is from one of the nucleic acids of 
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SEQ ID NOs.24-3883. The host cell, tissue, or organism 
may be any cell, tissue, or organism which recognizes 
the signal peptide encoded by the signal sequence in 
the EST-retated nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids. Suitable hosts 
include mammalian cells, tissues or organisms, avian 
ceils, tissues, or organisms, insect cells, tissues or or- 
ganisms, or yeast. 

[0437] In addition, the secretion vector contains clon- 
ing sites for inserting genes encoding the proteins which 
are to be secreted. The cloning sites facilitate the clon- 
ing of the insert gene in frame with the signal sequence 
such that a fusion protein in which the signal peptide is 
fused to the protein encoded by the inserted gene is ex- 
pressed from the mRNA transcribed from the promoter. 
The signal peptide directs the extracellular secretion of 
the fusion protein. 

[0438] The secretion vector may be DNA or RNA and 
may integrate into the chromosome of the host, be sta- 
bly maintained as an extrachromosomal replicon in the 
host, be an artificial chromosome, or be transiently 
present in the host. Preferably, the secretion vector is 
maintained in multiple copies in each host cell. As used 
herein, multiple copies means at least 2, 5, 10, 20, 25, 
50 or more than 50 copies per cell. In some embodi- 
ments, the multiple copies are maintained extrachromo- 
somally. In other embodiments, the multiple copies re- 
suit from amplification of a chromosomal sequence. 
[0439] Many nucleic acid backbones suitable for use 
as secretion vectors are known to those skilled in the 
art, including retroviral vectors, SV40 vectors, Bovine 
Papilloma Virus vectors, yeast integrating ptasmids, 
yeast episomal plasm ids, yeast artificial chromosomes, 
human artificial chromosomes, P element vectors, bac- 
ulovirus vectors, or bacterial plasmids capable of being 
transiently introduced into the host. 
[0440] The secretion vector may also contain a poiyA 
signal such that the poryA signal is located downstream 
of the gene inserted into the secretion vector. 
[0441] After the gene encoding the protein for which 
secretion is desired is inserted into the secretion vector, 
the secretion vector is introduced into the host cell, tis- 
sue, or organism using calcium phosphate precipitation, 
DEAE-Dextran, electroporation, liposome-mediated 
transfection, viral particles or as naked DNA. The pro- 
tein encoded by the inserted gene is then purified or en- 
riched from the supernatant using conventional tech- 
niques such as ammonium sulfate precipitation, immu- 
noprecipitation, immunoaffinitychromatography, size 
exclusion chromatography, ion exchange chromatogra- 
phy, and HPLC. Alternatively, the secreted protein may 
be in a sufficiently enriched or pure state in the super- 
natant or growth media of the host to permit it to be used 
for its intended purpose without further enrichment. 
[0442] The signal seq uences may also be inserted in- 
to vectors designed for gene therapy. In such vectors, 
the signal sequence is operably linked to a promoter 



such that mRNA transcribed from the promoter encodes 
the signal peptide. A cloning site is located downstream 
of the signal sequence such that a gene encoding a pro- 
tein whose secretion is desired may readily be inserted 
s into the vector and fused to the signal sequence. The 
vector is introduced into an appropriate host cell. The 
protein expressed from the promoter is secreted ext ra- 
ce!! ulariy, thereby producing a therapeutic effect. 



Fusion Vectors 

[0443] The EST-related nucleic acids, positional seg- 

15 ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may be 
used to construct fusion vectors for the expression of 
chimeric polypeptides. The chimeric polypeptides com- 
prise a first polypeptide portion and a second porypep- 

20 tide portion. In the fusion vectors of the present inven- 
tion, nucleic adds encoding the first polypeptide portion 
and the second polypeptide portion are joined in frame 
with one another so as to generate a nucleic acid en- 
coding the chimeric polypeptide. The nucleic acid en* 

25 coding the chimeric polypeptide is operably linked to a 
promoter which directs the expression of an mRNA en- 
coding the chimeric polypeptide. The promoter may be 
in any of the expression vectors described herein includ- 
ing those described in Examples 20 and 46. 

30 [0444] Preferably, the fusion vector is maintained in 
multiple copies in each host cell. In some embodiments, 
the multiple copies are maintained extrachromosomally. 
In other embodiments, the multiple copies result from 
amplification of a chromosomal sequence. 

35 [0445] The first polypeptide portion may comprise any 
of the polypeptides encoded by the EST-related nucleic 
acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nu- 
cleic acids. In some embodiments, the first polypeptide 

40 portion may be one of the EST-related polypeptides, 
fragments of EST-related polypeptides, positional seg- 
ments of EST-reiated polypeptides, or fragments of po- 
sitional segments of EST-related polypeptides. 
[0446] The second polypeptide portion may comprise 

45 any polypeptide of interest. In some embodiments, the 
second polypeptide portion may comprise a polypeptide 
having a detectable enzymatic activity such as green flu- 
orescent protein or beta galactosidase. Chimeric 
polypeptides in which the second polypeptide portion 

so comprises a detectable polypeptide may be used to de- 
termine the intracellular localization of the first polypep- 
tide portion. In such procedures, the fusion vector en- 
coding the chimeric polypeptide is introduced into a host 
cell under conditions which facilitate the expression of 

55 the chimeric polypeptide. Where appropriate, the cells 
are treated with a detection reagent which is visible un- 
der the microscope followin g a catalytic reaction with the 
detectable polypeptide and the cellular location of the 
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detection reagent is determined. For example, If the 
polypeptide having a detectable enzymatic activity is be- 
ta gaiactosidase, the cells may be treated with Xgal. Al- 
ternatively, where the detectable polypeptide is directly 
detectable without the addition of a detection reagent, 
the intracellular location of the chimeric polypeptide is 
determined by performing microscopy under conditions 
in which the dectable polypeptide is vistole. For exam- 
ple, if the detectable polypeptide is green fluorescent 
protein or a modified version thereof, microscopy is per- 
formed by exposing the host cells to light having an ap- 
propriate wavelength to cause the green fluorescent 
protein or modified version thereof to fluoresce. 
[0447] Alternatively, the second polypeptide portion 
may comprise a polypeptide whose isolation, purifica- 
tion, or enrichment is desired. In such embodiments, the 
isolation, purification, or enrichment of the second 
polypeptide portion may be achieved by performing the 
immunoaffinrty chromatography procedures described 
below using an immunoaffinity column having an anti- 
body directed against the first polypeptide portion cou- 
pled thereto. 

[0448] The proteins encoded by the EST-related nu- 
cleic acids, positional segments of EST-related nucleic 
acids or fragments of positional segments of EST-relat- 
ed nucleic acids or the EST-related polypeptides, frag- 
ments of EST-related polypeptides, positional segments 
of EST-related polypeptides, or fragments of positional 
segments of EST-related polypeptides may also be 
used to generate antibodies as explained in Examples 
20 and 33 in order to identify the tissue type or cell spe- 
cies from which a sample is derived as described in Ex- 
ample 48. 

EXAMPLE 48 

Identification of Tissue Types or Cell Species by Means 
of Labeled Tissue Specific Antibodies 

[0449] Identification of specific tissues is accom- 
plished by the visualization of tissue specific antigens 
by means of antibody preparations according to Exam- 
ples 20 and 33 which are conjugated, directly or indi- 
rectly to a detectable marker. Selected labeled antibody 
species bind to their specific antigen binding partner in 
tissue sections, cell suspensions, or in extracts of solu- 
ble proteins from a tissue sample to provide a pattern 
for qualitative or semi-qualitative interpretation. 
[0450] Antisera for these procedures must have a po- 
tency exceeding that of the native preparation, and for 
that reason, antibodies are concentrated to a mp/ml lev- 
el by isolation of the gamma globulin fraction, for exam- 
ple, by ion-exchange chromatography or by ammonium 
sulfate fractionation. Also, to provide the most specific 
antisera, unwanted antibodies, for example to common i 
proteins, must be removed from the gamma globulin 
fraction, for example by means of insoluble immunoab- 
sorbents, before the antibodies are labeled with the 



marker. Either monoclonal or heterologous antisera is 
suitable for either procedure. 

/. immunohistochemicai Technique. 

[0451] Purified, high -titer antibodies, prepared as de- 
scribed above, are conjugated to a detectable marker, 
as described, for example, by Fudenberg, H., Chap. 26 
in: Basic 503 Clinical immunology, 3 rd Ed. Lange, Los 
10 Altos, California (1980) or Rose, et al., Chap. 12 in: 
Methods in ImmunodiagNOsis, 2d Ed. John Wiley and 
Sons, New York (1 980), the disclosures of which are in- 
corporated herein by reference. 
[0452] A fluorescent marker, either fluorescein or 
15 rhodamine, is preferred, but antibodies can also be la- 
beled with an enzyme that supports a color producing 
reaction with a substrate, such as horseradish peroxi- 
dase. Markers can be added to tissue-bound antibody 
in a second step, as described below. Alternatively, the 
20 specific antitissue antibodies can be labeled with ferritin 
or other electron dense particles, and localization of the 
ferritin coupled antigen-antibody complexes achieved 
by means of an electron microscope. In yet another ap- 
proach, the antibodies are radiolabeled, with, for 
25 example 125 l, and detected by overlaying the antibody 
treated preparation with photographic emulsion. 
[0453] Preparations to carry out the procedures can 
comprise monoclonal or polyclonal antibodies to a sin- 
gle protein or peptide identified as specific to a tissue 
type, for example, brain tissue, or antibody preparations 
to several antigenicity distinct tissue specific antigens 
can be used in panels, independently or in mixtures, as 
required. 

[0454] Tissue sections and cell suspensions are pre- 

35 pared for immunohistochemical examination according 
to common histological techniques. Multiple cryostat 
sections (about 4 jim, unfixed) of the unknown tissue 
and known control, are mounted and each slide covered 
with different dilutions of the antibody preparation. Sec- 

^o tions of known and unknown tissues should also be 
treated with preparations to provide a positive control, 
a negative control, for example, pre-immune sera, and 
a control for non-specific staining, for example, buffer. 
[0455] Treated sections are incubated in a humid 

45 chamber for 30 min at room temperature, rinsed, then 
washed in buffer for 30-45 min. Excess fluid is blotted 
away, and the marker developed. 
[0456] If the tissue specific antibody was not labeled 
in the first incubation, it can be labeled at this time in a 

so second antibody-antibody reaction, for example, by 
adding fluorescein- or enzyme-conjugated antibody 
against the immunoglobulin class of the antiserum-pro- 
ducing species, for example, fluorescein labeled anti- 
body to mouse IgG. Such labeled sera are commercially 

55 available. 

[0457] The antigen found in the tissues by the above 
procedure can be quantified by measuring the intensity 
of color or fluorescence on the tissue section, and cali- 
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brating that signal using appropriate standards. 

2. identification of Tissue Specific Soluble Proteins 

[0458] The visualization of tissue specific proteins 5 
and identification of unknown tissues from that proce- 
dure is carried out using the labeled antfcody reagents 
and detection strategy as described for immunohisto- 
chemistry; however the sample is prepared according 
to an electrophoretic technique to distribute the proteins i o 
extracted from the tissue in an orderly array on the basis 
of molecular weight for detection. 
[0459] A tissue sample is homogenized using a vlrtis 
apparatus; cell suspensions are disrupted by Dounce 
homogenization or osmotic lysis, using detergents in ei- f 5 
ther case as required to disrupt cell membranes, as is 
the practice in the art Insoluble cell components such 
as nuclei, microsomes, and membrane fragments are 
removed by ultracentrif ugation , and the soluble protein- 
containing fraction concentrated if necessary and re- 20 
served for analysis. 

[0460] A sample of the soluble protein solution is re- 
solved into individual protein species by conventional 
SDS polyacrylamide electrophoresis as described, for 
example, by Davis.L. era/., Section 19-2 in: Basic Meth- 25 
ods in Molecular Biology (P. Leder, ed), Elsevier, New 
York (1986), the disclosure of which is incorporated 
herein by reference, using a range of amounts of poly- 
acrylamide in a set of gels to resolve the entire molecular 
weight range of proteins to be detected in the sample, so 
A size marker is run in parallel for purposes of estimating 
molecular weights of the constituent proteins. Sample 
size for analysis is a convenient volume of from 5 to 55 
u.l, and containing from about 1 to 100 u.g protein. An 
aiiq u ot of each of the resolved proteins is transferred by 35 
blotting to a nitrocellulose filter paper, a process that 
maintains the pattern of resolution. Multiple copies are 
prepared. The procedure, known as Western Blot Anal- 
ysis, is well described in Davis, L. et al., supra Section 
19-3. One set of nitrocellulose blots is stained with 40 
Coomassie Blue dye to visualize the entire set of pro- 
teins for comparison with the antibody bound proteins. 
The remaining nitrocellulose filters are then incubated 
with a solution of one or more specific antisera to tissue 
specific proteins prepared as described in Examples 20 45 
and 33. In this procedure, as in procedure A above, ap- 
propriate positive and negative sample and reagent 
controls are run. 

[0461] In either procedure described above a detect- 
able label can be attached to the primary tissue antigen- so 
primary antibody complex according to various strate- 
gies and permutations thereof. In a straightforward ap- 
proach, the primary specific antibody can be labeled; al- 
ternatively, the unlabeled complex can be bound by a 
labeled secondary anti-IgG antibody. In other approach- 55 
es, either the primary or secondary antibody is conju- 
gated to a biotin molecule, which can, in a subsequent 
step, bind an avidin conjugated marker. According to yet 



another strategy, enzyme labeled or radioactive protein 
A, which has the property of binding to any IgG, is bound 
in a final step to either the primary or secondary anti- 
body. 

EXAMPLE 49 

ImmunohistochemicaJ Localization of Polypeptides 

[0462] The antibodies prepared as described in Ex- 
amples 20 and 33 above may be utilized to determine 
the cellular location of a polypeptide. The polypeptide 
may be any of the polypeptides encoded by EST-related 
nucleic acids, positional segments of EST-related nu- 
cleic acids or fragments of positional segments of EST- 
related nucleic acids or the polypeptide may be one of 
the EST-related polypeptides, fragments of EST-related 
polypeptides, positional segments of EST-related 
polypeptides, or fragments of positional segments of 
EST-related polypeptides. In some embodiments, the 
polypeptide may be a chimeric polypeptide such as 
those encoded by the fusion vectors of Example 47. 
[0463] Cells expressing the polypeptide to be local- 
ized are applied to a microscope slide and fixed using 
any of the procedures typically employed in immunohis- 
tochemical localization techniques, including the meth- 
ods described in Current Protocols in Molecular Biology, 
John Wiley and Sons, Inc. 1997. Following a washing 
step, the cells are contacted with the antibody. In some 
embodiments, the antibody is conjugated to a detecta- 
ble marker as described above to facilitate detection. Al- 
ternatively, in some embodiments, after the cells have 
been contacted with an antibody to the polypeptide to 
be localized, a secondary antibody which has been con- 
jugated to a detectable marker is placed in contact with 
the antibodv against the polypeptide to be localized. 
[0464] Thereafter, microscopy is performed under 
conditions suitable for visualizing the cellular location of 
the polypeptide. 

[0465] The visualization of tissue specific antigen 
binding at levels above those seen in control tissues to 
one or more tissue specific antibodies, directed against 
the polypeptides encoded by EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids or antibodies against the EST-related polypep- 
tides, fragments of EST-related polypeptides, positional 
segments of EST-related polypeptides, or fragments of 
positional segments of EST-related polypeptides, can 
identify tissues of unknown origin, for example, forensic 
samples, or differentiated tumor tissue that has metas- 
tasized to foreign bodily sites. 

[0466] The antibodies of Example 20 and 33 may also 
be used in the immunoaffinity chromatography tech- 
niques described below to isolate, purify or enrich the 
polypeptides encoded by the EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
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adds or to isolate, purify or enrich EST-reJated polypep- 
tides, fragments of EST-related polypeptides, positional 
segments of EST-related polypeptides, or fragments of 
positional segments of EST-related polypeptides. The 
immunoaffinity chromatography techniques described 
below may also be used to isolate, purify or enrich 
polypeptides which have been linked to the polypep- 
tides encoded by the EST-related nucleic acids, posi- 
tional segments of EST-related nucleic acids or frag- 
ments of positional segments of EST-related nucleic ac- 
ids or to isolate, purify or enrich polypeptides which have 
been linked to EST-related polypeptides, fragments of 
EST-related porypeptides, positional segments of EST- 
related polypeptides, or fragments of positional seg- 
ments of EST-related polypeptides. 

EXAMPLE 50 

Immunoaffinity Chromatography 

[0467] Antibodies prepared as described above are 
coupled to a support. Preferably, the antibodies are 
monoclonal antibodies, but polyclonal antibodies may 
also be used. The support may be any of those typically 
employed in immunoaffinity chromatography, including 
Sepharose CL-4B (Pharmacia, Piscataway, NJ), 
Sepharose CL-2B (Pharmacia, Piscataway, NJ), Affi-gel 
10 (Biorad, Richmond, CA), or glass beads. 
[0468] The antibodies may be coupled to the support 
using any of the coupling reagents typically used in im- 
munoaffinity chromatography, including cyanogen bro- 
mide. After coupling the antibody to the support, the sup- 
port is contacted with a sample which contains a target 
polypeptide whose isolation, purification or enrichment 
is desired. The target polypeptide may be a polypeptide 
encoded by the EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids or the 
target polypeptide may be one of the EST-related 
polypeptides, fragments of EST-related polypeptides, 
positional segments of EST-related polypeptides, or 
fragments of positional segments of EST-related 
porypeptides. The target polypeptides may also be 
polypeptides which have been linked to the polypep- 
tides encoded by the EST-related nucleic acids, posi- 
tional segments of EST-related nucleic acids or frag- 
ments of positional segments of EST-related nucleic ac- 
ids or the target polypeptides may be polypeptides 
which have been linked to EST-related polypeptides, 
fragments of EST-related polypeptides, positional seg- 
ments of EST-related polypeptides, or fragments of po- 
sitional segments of EST-related polypeptides using the 
fusion vectors described above. 
[0469] Preferably, thesample is placed in contact with 
the support for a sufficient amount of time and under 
appropriate conditions to allow at least 50% of the target 
polypeptide to specifically bind to the antibody coupled 
to the support. 



[0470] Thereafter, the support is washed with an ap- 
propriate wash solution to remove porypeptides which 
have non -specifically adhered to the support. The wash 
solution may be any of those typically employed in im- 
5 munoaffinrty chromatography, including PBS, Tris-lithl- 
um chloride buffer (0.1 M lysine base and 0.5M lithium 
chloride, pH 8.0), Tris-hydrochloride buffer (0.06M Tris- 
hydrochloride, pH 8.0), or Tris/T riton/NaCI buffer (50mM 
Tris.cl, pH 8.0 or9.0, 0.1%Trfton X-100, and 0.5MNaCI). 
io [0471] After washing, the specifically bound target 
polypeptide is eluted from the support using the high pH 
or low pH elution solutions typically employed in immu- 
noaffinity chromatography. In particular, the elution so- 
lutions may contain an eluant such as triethanolamine, 

« diethylamine, calcium chloride, sodium thiocyanate, po- 
tasssium bromide, acetic acid, or glycine. In some em- 
bodiments, the elution solution may also contain a de- 
tergent such as Triton X-100 or octyl-beta-D-glucoside. 
[0472] The EST-related nucleic acids, positional seg- 

20 ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may also 
be used to clone sequences located upstream of the 
5'ESTs which arc capable of regulating gene expres- 
sion, including promoter sequences, enhancer se- 

25 quences, and other upstream sequences which influ- 
ence transcription or translation levels. Once identified 
and cloned, these upstream regulatory sequences may 
be used in expression vectors designed to direct the ex- 
pression of an inserted gene in a desired spatial, tem- 

30 poral, developmental, or quantitative fashion. Example 
51 describes a method for cloning sequences upstream 
of the EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids. 

35 

2. Identification of upstream sequences with promoting 
or regulatory activities 

EXAMPLE 51 

40 

Use of EST-related nucleic acids, positional segments 
of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids to Clone 
Upstream Sequences from Genomic DNA 

45 

[0473] Sequences derived from EST-related nucleic 
acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nu- 
cleic acids may be used to isolate the promoters of the 

so corresponding genes using chromosome walking tech- 
niques. In one chromosome walking technique, which 
utilizes the GenomeWalker kit available from Clontech, 
five complete genomic DNA samples are each digested 
with a different restriction enzyme which has a 6 base 

55 recognition site and leaves a blunt end. Following diges- 
tion, oligonucleotide adapters are ligated to each end of 
the resulting genomic DNA fragments. 
[0474] For each of the five genomic DNA libraries, a 
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first PCR reaction is performed according to the manu- 
facturer's instructions (which are incorporated herein by 
reference) using an outer adapter primer provided in the 
kit and an outer gene specific primer. The gene specific 
primer should be selected to be specific for 5'EST of in- 
terest and should have a melting temperature, length, 
and location in the EST-related nucleic adds, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids which 
is consistent with its use in PCR reactions. Each first 
PCR reaction contains 5ng of genomic DNA, 5 pJ of 1 0X 
Tth reaction buffer, 0.2 mM of each dNTP, 0.2 \lM each 
of outer adapter primer and outer gene specific primer, 
1 .1 mM of Mg(OAc>2, and 1 jxl of the Tth polymerase 
SOX mix in a total volume of 50 pJ. The reaction cycle 
for the first PCR reaction is as follows: 1 min at 94°C / 

2 sec at 94°C, 3 min at 72*C (7 cycles) / 2 sec at 94°C, 

3 min at 67°C (32 cycles) / 5 min at 67°C. 

[0475] The product of the first PCR reaction is diluted 
and used as a template for a second PCR reaction ac- 
cording to the manufacturer's instructions using a pair 
of nested primers which are located internally on the am- 
plicon resulting from the first PCR reaction. For exam- 
ple, 5 jxl of the reaction product of the first PCR reaction 
mixture may be diluted 180 times. Reactions are made 
in a 50 (il volume having a composition identical to that 
of the first PCR reaction except the nested primers are 
used. The first nested primer is specific for the adapter, 
and is provided with the GenomeWalker kit. The second 
nested primer is specific for the particular EST-related 
nucleic acids, positional segments of EST-related nu- 
cleic acids or fragments of positional segments of EST- 
related nucleic acids for which the promoter is to be 
cloned and should have a melting temperature, length, 
and location in the EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids which 
is consistent with its use in PCR reactions. The reaction 
parameters of the second PCR reaction are as follows: 
I min at 94°C / 2 see at 94 W C, 3 min at 72°C (6 cycles) 
/ 2 sec at 94°C, 3 min at 67°C (25 circles) / 5 min at - 
67°C. The product of the second PCR reaction is puri- 
fied, cloned, and sequenced using standard techniques. 
[0476] Alternatively, two or more human genomic 
DNA libraries can be constructed by using two or more 
restriction enzymes. The digested genomic DNA is 
cloned into vectors which can be converted into single 
stranded, circular, or linear DNA. A biotinylated oligonu- 
cleotide comprising at least 15 nucleotides from the 
EST-related nucleic acids, positional segments of EST- 
related nucleic acids or fragments of positional seg- 
ments of EST-related nucleic acids sequence is hybrid- 
ized to the single stranded DNA. Hybrids between the 
biotinylated oligonucleotide and the single stranded 
DNA containing the EST-related nucleic acids, position- 
al segments of EST-related nucleic acids or fragments 
of positional segments of EST-related nucleic acids are 
isolated as described above. Thereafter, the single 



stranded DNA containing the EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids is released from the beads and converted into 

5 double stranded DNA using a primer specific for the 
EST-related nucleic acids, positional segments of EST- 
related nucleic acids or fragments of positional seg- 
ments of EST-related nucleic acids or a primer corre- 
sponding to a sequence included in the cloning vector. 

10 The resulting double stranded DNA is transformed into 
bacteria. cDNAs containing the EST-related nucleic ac- 
ids, positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids are identified by colony PCR or colony hybridiza- 

« tion. 

Once the upstream genomic sequences have been 
cloned and sequenced as described above, prospective 
promoters and transcription start sites within the up- 
stream sequences may be identified by comparing the 

20 sequences upstream of the EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids with databases containing known transcription 
start sites, transcription factor binding sites, or promoter 

25 sequences. 

[0477] In addition, promoters in the upstream se- 
quences may be identified using promoter reporter vec- 
tors as described in Example 52. 

30 EXAMPLE 52 

Identification of Promoters in Cloned Upstream 
Sequences 

35 [Q478] The genomic sequences upstream of the EST- 
related nucleic acids, positional segments of EST-relat- 
ed nucleic acids or fragments of positional segments of 
EST-related nucleic acids are cloned into a suitable pro- 
moter reporter vector, such as the pSEAP-Basic, 

40 pSEAP-Enhancer, p gal-Basic, p gal- Enhancer, or 
pEGFP-1 Promoter Reporter vectors available from 
Clontech. Briefly, each of these promoter reporter vec- 
tors include multiple cloning sites positioned upstream 
of a reporter gene encoding a readily assayable protein 

45 such as secreted alkaline phosphatase, galactosidase, 
or green fluorescent protein. The sequences upstream 
of the EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids are inserted into 

so the cloning sites upstream of the reporter gene in both 
orientations and introduced into an appropriate host cell. 
The level of reporter protein is assayed and compared 
to the level obtained from a vector which lacks an insert 
in the cloning site. The presence of an elevated expres- 

55 sion level in the vector containing the insert with respect 
to the control vector indicates the presence of a promot- 
er in the insert. If necessary, the upstream sequences 
can be cloned into vectors which contain an enhancer 
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for augmenting transcription levels from weak promoter 
sequences. A significant level of expression above that 
observed with the vector lacking an insert indicates that 
a promoter sequence is present in the inserted up- 
stream sequence. 5 
[0479] Appropriate host ceils for the promoter reporter 
vectors may be chosen based on the results of the 
above described determination of expression patterns 
of the EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional io 
segments of EST-related nucleic acids. For example, if 
the expression pattern analysis indicates that the mRNA 
corresponding to a particular EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 1$ 
acids is expressed in fibroblasts, the promoter reporter 
vector may be introduced into a human fibroblast cell 
line. 

[0480] Promoter sequences within the upstream ge- 
nomic DNA may be further defined by constructing nest- 20 
ed deletions in the upstream DNA using conventional 
techniques such as Exonuclease II! digestion. The re- 
sulting deletion fragments can be inserted into the pro- 
moter reporter vector to determine whether the deletion 
has reduced or obi iterated promoter activity. In this way, 25 
the boundaries of the promoters may be defined. If de- 
sired, potential individual regulatory sites within the pro- 
moter may be identified using site directed mutagenesis 
or linker scanning to obliterate potential transcription 
factor binding sites within the promoter individually or in 30 
combination. The effects of these mutations on tran- 
scription levels may be determined by inserting the mu- 
tations into the cloning sites in the promoter reporter 
vectors. 

35 

EXAMPLE 53 

Cloning and Identification of Promoters 

[0481] Using the method described in Example 51 40 
above with 5' ESTs, sequences upstream of several 
genes were obtained. Using the primer pairs GGG AAG 
ATG GAG ATA GTA TTG CCT G (SEQ ID N0.15) and 
CTG CCA TGT ACA TGA TAG AGA GAT TC (SEQ ID 
NO. 16), the promoter having the internal designation 45 
P1 3H2 (SEQ ID N0.1 7) was obtained. 
[0482] Using the primer pairs GTA CCA GGGG ACT 
GTG ACC ATT GC (SEQ ID NO.1 8) and CTG TGA CCA 
TTG CTC CCA AGA GAG (SEQ ID NO.1 9), the promot- 
er having the internal designation P15B4 (SEQ ID NO. so 
20) was obtained. 

[0483] Using the primer pairs CTG GGA TGG AAG 
GCA CGG TA (SEQ ID NO 21 ) and GAG ACC ACA CAG 
CTA GAC AA (SEQ ID N0.22), the promoter having the 
internal designation P29B6 (SEQ ID NO.23) was ob- ss 
tained. 

[0484] Figure 4 provides a schematic description of 
the promoters isolated and the way they are assembled 



with the corresponding 5 1 tags. The upstream sequenc- 
es were screened for the presence of motifs resembling 
transcription factor binding sites or known transcription 
start sites using the computer program Matlnspector re- 
lease 2.0, August 1996. 

[0485] Figure 5 describes the transcription factor 
binding sites present in each of these promoters. The 
columns labeled matrice provides the name of the 
Matlnspector matrix used. The column labeled position 
provides the 5' position of the promoter site. Numeration 
of the sequence starts from the transcription site as de- 
termined by matching the genomic sequence with the 5' 
EST sequence. The column labeled "orientation" indi- 
cates the DNA strand on which the site is found, with 
the + strand being the coding strand as determined by 
matching the genomic sequence with the sequence of 
the 5* EST. The column labeled "score" provides the 
Matlnspector score found for this site. The column la- 
beled "length" provides the length of the site in nucle- 
otides. The column labeled "sequence" provides the se- 
quence of the site found. 

[0486] Bacterial clones containing plasm ids contain- 
ing the promoter sequences described above described 
above are presently stored in the inventor's laboratories 
under the internal identification numbers provided 
above. The inserts may be recovered from the deposit- 
ed materials by growing an aliquot of the appropriate 
bacterial done in the appropriate medium. The plasmid 
DNA can then be isolated using plasmid isolation pro- 
cedures familiar to those skilled in the art such as alka- 
line lysis minipreps or large scale alkaline lysis plasmid 
isolation procedures. If desired the plasmid DNA may 
be further enriched by centrifugation on a cesium chlo- 
ride gradient, size exclusion chromatography, or anion 
exchange chromatography. The plasmid DNA obtained 
using these procedures may then be manipulated using 
standard cloning techniques familiar to those skilled in 
the art. Alternatively, a PCR can be done with primers 
designed at both ends of the inserted EST-related nu- 
cleic acids, positional segments of EST-related nucleic 
acids or fragments of positional segments of EST-relat- 
ed nucleic acids. The PCR product which corresponds 
to the EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids can then be ma- 
nipulated using standard cloning techniques familiar to 
those skilled in the art. 

[0487] The promoters and other regulatory sequenc- 
es located upstream of the EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids may be used to design expression vectors capa- 
ble of directing the expression of an inserted gene in a 
desired spatial, temporal, developmental, or quantita- 
tive manner. A promoter capable of directing the desired 
spatial, temporal, developmental, and quantitative pat- 
terns may be selected using the results of the expres- 
sion analysis described above. For example, if a pro- 
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moter which confers a high level of expression in muscle 
is desired, the promoter sequence upstream of EST-re- 
lated nucleic acids, positional segments of EST-related 
nucleic acids or fragments of positional segments of 
EST-related nucleic acids derived from an m RNA which 
are expressed at a high level in muscle, as determined 
by the methods above, may be used in the expression 
vector. 

[0488] Preferably, the desired promoter is placed near 
multiple restriction sites to facilitate the cloning of the 
desired insert downstream of the promoter, such that the 
promoter is able to drive expression of the inserted 
gene. The promoter may be inserted in conventional nu- 
cleic acid backbones designed for extrachromosomal 
replication, integration Into the host chromosomes or 
transient expression. Suitable backbones for the 
present expression vectors include retroviral back- 
bones, backbones from eukaryotic episomes such as 
SV40 or Bovine Papilloma Virus, backbones from bac- 
terial episomes, or artificial chromosomes. 
[0489] Preferably, the expression vectors also include 
a pofyA signal downstream of the multiple restriction 
sites for directing the pofyadenylation of mRNA tran- 
scribed from the gene inserted into the expression vec- 
tor. 

[0490] Following the identification of promoter se- 
quences using the procedures of Examples 51-53, pro- 
teins which interact with the promoter may be identified 
as described in Example 54 below. 

EXAMPLE 54 

Identification of Proteins Which Interact with Promoter 
Sequences, Upstream Regulatory Sequences, or 
mRNA 

[0491] Sequences within the promoter region which 
are likely to bind transcription factors may be identified 
by homology to known transcription factor binding sites 
or through conventional mutagenesis or deletion analy- 
ses of reporter plasmids containing the promoter se- 
quence. For example, deletions may be made in a re- 
porter plasmid containing the promoter sequence of in- 
terest operably linked to an assayable reporter gene. 
The reporter plasmids carrying various deletions within 
the promoter region are transfected into an appropriate 
host cell and the effects of the deletions on expression 
levels is assessed. Transcription factor binding sites 
within the regions in which deletions reduce expression 
levels may be further localized using site directed mu- 
tagenesis, linker scanning analysis, or other techniques 
familiar to those skilled in the art. 
[0492] Nucleic acids encoding proteins which interact 
with sequences in the promoter may be identified using 
one-hybrid systems such as those described in the man- 
ual accompanying the Matchmaker One-Hybrid System 
kit available from Clontech (Catalog No. K1 603-1), the 
disclosure of which is incorporated herein by reference. 



Briefly, the Matchmaker One-hybrid system is used as 
follows. The target sequence for which It is desired to 
identify binding proteins is cloned upstream of a selecta- 
ble reporter gene and integrated into the yeast genome. 

5 Preferably, multiple copies of the target sequences are 
inserted into the reporter plasmid in tandem. A library 
comprised of fusions between cONAs to be evaluated 
for the ability to bind to the promoter and the activation 
domain of a yeast transcription factor, such as GAL4, is 

io transformed into the yeast strain containing the integrat- 
ed reporter sequence. The yeast are plated on selective 
media to select cells expressing the selectable marker 
linked to the promoter sequence. The colonies which 
grow on the selective media contain genes encoding 

15 proteins which bind the target sequence. The inserts in 
the genes encoding the fusion proteins are further char- 
acterized by sequencing. In addition, the inserts may be 
inserted into expression vectors or in vitro transcription 
vectors. Binding of the polypeptides encoded by the in- 

20 serts to the promoter DNA may be confirmed by tech- 
niques familiar to those skilled in the art, such as gel 
shift analysis or DNAse protection analysis. 

VIII. Use of EST-related nucleic acids, positional 
25 segments of EST-related nucleic acids or fragments 
of positional segments of EST-related nucleic acids 
In Gene Therapy 

[0493] The present invention also comprises the use 

30 of EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids in gene therapy 
strategies, including antisense and triple helix strategies 
as described in Examples 55 and 56 below. In antisense 

35 approaches, nucleic acid sequences complementary to 
an mRNA are hybridized to the mRNA intraceliulariy, 
thereby blocking the expression of the protein encoded 
by the mRNA. The antisense sequences may prevent 
gene expression through a variety of mechanisms. For 

40 example, the antisense sequences may inhibit the abil- 
ity of ribosomes to translate the m RNA. Alternatively, the 
antisense sequences may block transport of the mRNA 
from the nucleus to the cytoplasm, thereby limiting the 
amount of mRNA available for translation. Another 

45 mechanism through which antisense sequences may 
inhibit gene expression is by interfering with mRNA 
splicing. In yet another strategy, the antisense nucleic 
acid may be incorporated in a ribozyme capable of spe- 
cifically cleaving the target mRNA. 

50 

EXAMPLE 55 

Preparation and Use of Antisense Oligonucleotides 

55 [0494] The antisense nucleic acid molecules to be 
used in gene therapy may be either DNA or RNA se- 
quences. They may comprise a sequence complemen- 
tary to the sequence of the EST-related nucleic acids, 
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positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucteic 
acids. The antisense nucteic acids should have a length 
and melting temperature sufficient to permit formation 
of an intracellular duplex with sufficient stability to inhibit 
the expression of the mRNA in the duplex. Strategies 
for designing antisense nucleic acids suitable for use in 
gene therapy are disclosed in Green et al., Ann. Rev. 
Bbchem. 55:569-597 (1986) and Izant and Weintraub, 
Ce//36:1 007-1 015(1 984), which are hereby incorporat- 
ed by reference. 

[0495] In some strategies, antisense molecules are 
obtained from a nucleotide sequence encoding a protein 
by reversing the orientation of the coding region with re- 
spect to a promoter so as to transcribe the opposite 
strand from that which is normally transcribed in the cell. 
The antisense molecules may be transcribed using in 
vitro transcription systems such as those which employ 
T7 or SP6 polymerase to generate the transcript. An- 
other approach involves transcription of the antisense 
nucleic acids in wvoby operably linking DNA containing 
the antisense sequence to a promoter in an expression 
vector. 

[0496] Alternatively, oligonucleotides which are com- 
plementary to the strand normally transcribed in the cell 
may be synthesized in vitro. Thus, the antisense nucleic 
acids are complementary to the corresponding mRNA 
and are capable of hybridizing to the mRNA to create a 
duplex. In some embodiments, the antisense sequenc- 
es may contain modified sugar phosphate backbones 
to increase stability and make them less sensitive to 
RNase activity. Examples of modifications suitable for 
use in antisense strategies are described by Rossi et 
al., Pharmacol. Ther. 50(2):245-254, (1991) which is 
hereby incorporated by reference. 
[0497] Various types of antisense oligonucleotides 
complementary to the sequence of the EST-related nu- 
cleic acids, positional segments of EST-related nucleic 
acids or fragments of positional segments of EST-relat- 
ed nucleic acids may be used. In one preferred embod- 
iment, stable and semi-stable antisense oligonucle- 
otides described in International Application No. PCT 
WO94/23026, hereby incorporated by reference, are 
used. In these molecules, the 3' end or both the 3 1 and 
5' ends are engaged in intramolecular hydrogen bond- 
ing between complementary base pairs. These mole- 
cules are better able to withstand exonuclease attacks 
and exhibit increased stability compared to conventional 
antisense oligonucleotides. 

[0498] In another preferred embodiment, the anti- 
sense oligodeoxynucleotides against herpes simplex vi- 
rus types 1 and 2 described in International Application 
No. WO 95/04141, hereby incorporated by reference, 
are used. 

[0499] In yet another preferred embodiment, the cov- 
alently cross-linked antisense oligonucleotides de- 
scribed in International Application No. WO 96/31523, 
hereby incorporated by reference, are used. These dou- 



ble- or single-stranded oligonucleotides comprise one 
or more, respectively, inter- or intra-oligonucleotidecov- 
alent cross-linkage, wherein the linkage consists of an 
amide bond between a primary amine group of one 
5 strand and a carboxyl group of the other strand or of the 
same strand, respectively, the primary amine group be- 
ing directly substituted in the 2' position of the strand 
nucleotide monosaccharide ring, and the carboxyl 
group being carried by an aliphatic spacer group substi- 
w tuted on a nucleotide or nucleotide analog of the other 
strand or the same strand, respectively. 
[0500] The antisense oligodeoxynucleotides and oli- 
gonucleotides disclosed in International Application No. 
WO 92/18522, incorporated by reference, may also be 
15 used. These molecules are stable to degradation and 
contain at least one transcription control recognition se- 
quence which binds to control proteins and are effective 
as decoys therefor. These molecules may contain "hair- 
pin" structures, "dumbbell" structures, "modified dumb- 
20 bell" structures, "cross-linked" decoy structures and 
"loop" structures. 

[0501] In another preferred embodiment, the cyclic 
double-stranded oligonucleotides described in Europe- 
an Patent Application No. 0 572 287 A2, hereby incor- 
25 porated by reference are used. These ligated oligonu- 
cleotide "dumbbells" contain the binding site for a tran- 
scription factor and inhibit expression of the gene under 
control of the transcription factor by sequestering the 
factor. 

30 [0502] Use of the closed antisense oligonucleotides 
disclosed in International Application No. WO 92/19732, 
hereby incorporated by reference, is also contemplated. 
Because these molecules have no free ends, they are 
more resistant to degradation by exonucleases than are 

35 conventional oligonucleotides. These oligonucleotides 
may be multifunctional, interacting with several regions 
which are not adjacent to the target mRNA. 
[0503] The appropriate level of antisense nucleic ac- 
ids required to inhibit gene expression may be deter- 

40 mined using in vitro expression analysis. The antisense 
molecule may be introduced into the ceils by diffusion, 
injection, infection or transfection using procedures 
known in the art. For example, the antisense nucleic ac- 
ids can be introduced into the body as a bare or naked 

45 oligonucleotide, oligonucleotide encapsulated in lipid, 
oligonucleotide sequence encapsidated by viral protein, 
or as an oligonucleotide operably linked to a promoter 
contained in an expression vector. The expression vec- 
tor may be any of a variety of expression vectors known 

50 in the art, including retroviral or viral vectors, vectors ca- 
pable of extrachromosomal replication, or integrating 
vectors. The vectors may be DNA or RNA. 
[0504] The antisense molecules are introduced onto 
cell samples at a number of different concentrations 

& preferably between 1 x1 0" 1 °M to 1x1 (HM. Once the min- 
imum concentration that can adequately control gene 
expression is identified, the optimized dose is translated 
into a dosage suitable for use in vivo. For example, an 
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inhibiting concentration in culture of 1 x 1 0~ 7 translates in- 
to a dose of approximately 0.6 mg/kg bodyweight Lev- 
els of oligonucleotide approaching 100 mg/kg body- 
weight or higher may be possible after testing the toxicity 
of the oligonucleotide in laboratory animals. It is addi- s 
tionalty contemplated that cells from the vertebrate are 
removed, treated with the antisense oligonucleotide, 
and reintroduced into the vertebrate. 
[0505] It is further contemplated that the antisense ol- 
igonucleotide sequence is incorporated into a ribozyme 10 
sequence to enable the antisense to specifically bind 
and cleave its target mRNA. For technical applications 
of ribozyme and antisense oligonucleotides see Rossi 
etal., supra. 

[0506] In a preferred application of this invention, the is 
polypeptide encoded by the gene is first identified, so 
that the effectiveness of antisense inhibition on transla- 
tion can be monitored using techniques that include but 
are not limited to antibody-mediated tests such as RIAs 
and ELISA, functional assays, or radiolabeiing. 20 
[0507] The EST-related nucleic adds, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may also 
be used in gene therapy approaches based on intracel- 
lular triple helix formation. Triple helix oligonucleotides 25 
are used to inhtoit transcription from a genome. They 
are particularly useful for studying alterations in cell ac- 
tivity as it is associated with a particular gene. The EST- 
related nucleic acids, positional segments of EST-relat- 
ed nucleic acids or fragments of positional segments of 30 
EST-related nucleic acids of the present invention or, 
more preferably, a portion of those sequences, can be 
used to inhibit gene expression in individuals having dis- 
eases associated with expression of a particular gene. 
Similarly, the EST-related nucleic acids, positional seg- 35 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids can be 
used to study the effect of inhibiting transcription of a 
particular gene within a cell. Traditionally, homopurine 
sequences were considered the most useful for tnple 40 
helix strategies. However, homopyrimidine sequences 
can also inhibit gene expression. Such homopyrimidine 
oligonucleotides bind to the major groove at homopu- 
rine:homopyrimidine sequences. Thus, both types of 
sequences from the EST-related nucleic acids, position- 45 
al segments of EST-related nucleic acids or fragments 
of positional segments of EST-related nucleic acids are 
contemplated within the scope of this invention. 

EXAMPLE 56 so 

Preparation and use of Triple Helix Probes 

[0508] The sequences of the EST-related nucleic ac- 
ids, positional segments of EST-related nucleic acids or ss 
fragments of positional segments of EST-related nucleic 
acids are scanned to identify 1 0-mer to 20-mer homopy- 
rimidine or homopunne stretches which could be used 



in triple-helix based strategies for inhibiting gene ex- 
pression. Following identification of candidate homopy- 
rimidine or homopurine stretches, their efficiency in in- 
hibiting gene expression is assessed by introducing var- 
ying amounts of oligonucleotides containing the candi- 
date sequences into tissue culture cells which normally 
express the target gene. The oligonucleotides may be 
prepared on an oligonucleotide synthesizer or they may 
be purchased commercially from a company specializ- 
ing in custom oligonucleotide synthesis, such as 
GENSET, Paris, France. 

[0509] The oligonucleotides may be introduced into 
the cells using a variety of methods known to those 
skilled in the art, including but not limited to calcium 
phosphate precipitation, DEAE-Dextran, electropora- 
tion, liposome-mediated transfection or native uptake. 
[0510] Treated cells are monitored for altered cell 
function or reduced gene expression using techniques 
such as Northern blotting, RNase protection assays, or 
PCR based strategies to monitor the transcription levels 
of the target gene in cells which have been treated with 
the oligonucleotide. The cell functions to be monitored 
are predicted based upon the homologies of the target 
genes corresponding to the EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids from which the oligonucleotide were derived with 
known gene sequences that have been associated with 
a particular function. The cell functions can also be pre- 
dicted based on the presence of abnormal physiologies 
within cells derived from individuals with a particular in- 
herited disease, particularly when the EST-related nu- 
cleic acids, positional segments of EST-related nucleic 
acids or fragments of positional segments of EST-relat- 
ed nucleic acids arc associated with the disease using 
techniques described herein. 

[0511] The oligonucleotides which are effective in in- 
hibiting gene expression in tissue culture celts may then 
be introduced in vivo using the techniques described 
above and in Example 55 at a dosage calculated based 
on the in vitro results, as described in Example 55. 
[0512] In some embodiments, the natural (beta) ano- 
mers of the oligonucleotide units can be replaced with 
alpha anomers to render the oligonucleotide more re- 
sistant to nucleases. Further, an intercalating agent 
such as ethidium bromide, or the like, can be attached 
to the 3' end of the alpha oligonucleotide to stabilize the 
triple helix. For information on the generation of oligo- 
nucleotides suitable for triple helix formation see Griffin 
etal. (Science 245:967-971 (1989), which is hereby in- 
corporated by this reference). 

EXAMPLE 57 

Use of EST-related nucleic acids, positional segments 
of EST-related nucleic acids or fragments of positional 
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segments of EST-related nucleic adds to express an 
Encoded Protein in a Host Organism 

[051 3] The EST-related nucleic acids , positional seg- 
ments of EST-related nucleic adds or fragments of po- 5 
srtional segments of EST-related nucleic adds may also 
be used to express an encoded protein or polypeptide 
in a host organism to produce a beneficial effect. In ad- 
dition, nudeic acids encoding the EST-related polypep- 
tides, positional segments of EST-related polypeptides 10 
or fragments of positional segments of EST-related 
polypeptides may be used to express the encoded pro- 
tein or polypeptide in a host organism to produce a ben- 
efidal effect 

[0514] In such procedures, the encoded protein or is 
polypeptide may be transiently expressed in the host or- 
ganism or stably expressed in the host organism. The 
encoded protein or polypeptide may have any of the ac- 
tivities described above. The encoded protein or 
polypeptide may be a protein or polypeptide which the 20 
host organism lacks or, alternatively, the encoded pro- 
tein may augment the existing levels of the protein in the 
host organism. 

[0515] In some embodiments in which the protein or 
polypeptide is secreted, nudeic acids encoding the full 25 
length protein (i.e. the signal peptide and the mature 
protein), or nudeic acids encoding only the mature pro- 
tein (i.e. the protein generated when the signal peptide 
is cleaved off) is introduced into the host organism. 
[0516] The nucleic adds encoding the proteins or so 
polypeptides may be introduced into the host organism 
using a variety of techniques known to those of skill in 
the art. For example, the extended cDN A may be inject- 
ed into the host organism as naked DNA such that the 
encoded protein is expressed in the host organism, 35 
thereby produdng a beneficial effect. 
[0517] Alternatively, the nudeic adds encoding the 
protein or polypeptide may be cloned into an expression 
vector downstream of a promoter which is active in the 
host organism. The expression vector may be any of the 40 
expression vectors designed for use in gene therapy, 
including viral or retroviral vectors. The expression vec- 
tor may be directly introduced into the host organism 
such that the encoded protein is expressed in the host 
organism to produce a beneficial effect. In another ap- 45 
proach, the expression vector may be introduced into 
cells in vitro. Cells containing the expression vector are 
thereafter selected and introduced into the host organ- 
ism, where they express the encoded protein or 
polypeptide to produce a beneficial effect. so 

EXAMPLE 58 

Use of Signal Peptides To Import Proteins Into Cells 

55 

[051 8] The short core hydrophobic region (h) of signal 
peptides encoded by the sequences of SEQ ID NOs. 
24-383 and 1339-2059 may also be used as a earner to 



import a peptide or a protein of interest, so-called cargo, 
into tissue culture cells (Un et a/., J. Biol. Chem., 270: 
14225-14258 (1995); Du et al., J. Peptide Res., 51: 
235-243 (1998); Rojas ef a/.. Nature Biotech., 16: 
370-375(1998)). 

[0519] When cell permeable peptides of limited size 
(approximately up to 25 amino acids) are to be translo- 
cated across cell membrane, chemical synthesis may 
be used in order to add the h region to either the C-ter- 
minus or the N -terminus to the cargo peptide of interest. 
Alternatively, when longer peptides or proteins are to be 
imported into ceils, nudeic adds can be genetically en- 
gineered, using techniques familiar to those skilled in 
the art, in order to link the extended cONA sequence 
encoding the h region to the 5' or the 3' end of a DNA 
sequence coding for a cargo polypeptide. Such geneti- 
cally engineered nucleic acids are then translated either 
in vitro or in vivo after transfection into appropriate cells, 
using conventional techniques to produce the resulting 
cell permeable polypeptide. Suitable hosts cells are 
then simply incubated with the cell permeable polypep- 
tide which is then translocated across the membrane. 
[0520] This method may be applied to study diverse 
intracellular functions and cellular processes. For in- 
stance, it has been used to probe functionally relevant 
domains of intracellular proteins and to examine protein- 
protein interactions involved in signal transduction path- 
ways (Lin ef at., supra; Lin et at., J. Biol. Chem.. 271 : 
5305-5308 (1996); Rojas et a!., J. Biol. Chem., 271: 
27456-27461 (1996); Liu etal., Proc. Natl. Acad. Sci. 
USA, 93: 11819-11824 (1996); Rojas etal., Bioch. Bio- 
phys. Res. Commun., 234: 675-680 (1997)). 
(0521] Such techniques may be used in cellular ther- 
apy to import proteins producing therapeutic effects. For 
instance, cells isolated from a patient may be treated 
with imported therapeutic proteins and then re-intro- 
duced into the host organism. 

[0522] Alternatively, the h region of signal peptides of 
the present invention could be used in combination with 
a nuclear localization signal to deliver nucleic adds into 
cell nudeus. Such oligonudeotides may be antisense 
oligonucleotides or oligonucleotides designed to form 
triple helixes, as describedabove, in order to inhibit 
processing and maturation of a target cellular RNA. 

EXAMPLE 59 

Computer Embodiments 

[0523] As used herein the term °cDNA codes of SEQ 
ID NOs. 24-3883 and 7744-19335" encompasses the 
nucleotide sequences of SEQ ID NOs. 24-3 883 and 
7744-19335, fragments of SEQ ID NOs. 24-3883 and 
7744-19335, nucleotide sequences homologous to 
SEQ ID NOs. 24-3883 and 7744-19335 or homologous 
to fragments of SEQ ID NOs. 24-3883 and 7744-1 9335, 
and sequences complementary to all of the preceding 
sequences. The fragments include fragments of SEQ ID 
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NOs. 24-3883 and 7744*19335 comprising at least 8, 
10, 12, 15, 18, 20, 25, 28, 30, 35, 40, 50, 75, 100, 150, 
200, 300, 400, 500, 1000 or 2000 consecutive nucle- 
otides of SEQ ID NOs. 24-3883 and 7744-19335. Pref- 
erably, the fragments are novel fragments. Preferably 
the fragments include polynucleotides described in Ta- 
bles IVa and IVb, polynucleotides described in Tables 
I Va and IVb updated, or fragments thereof comprising 
at least 8, 10, 12, 15, 18, 20, 25, 28, 30, 35, 40, 50, 75, 
1 00, 1 50, 200, 300, 400, 500, 1 000 or 2000 consecutive 
nucleotides of the polynucleotides described in Tables 
IVa and IVb, or polynucleotides described in Tables IVa 
and IVb updated. Homologous sequences and frag- 
ments of SEQ ID NOs. 24-3883 and 7744-19335 refer 
to a sequence having at least 99%, 98%, 97%, 96%, 
95%, 90%, 85%, 80%, or 75% homology to these se- 
quences. Homology may be determined using any of the 
computer programs and parameters described in Exam- 
ple 1 7, including BLAST2N with the default parameters 
or with any modified parameters. Homologous sequenc- 
es also include RNA sequences in which uridines re- 
place the thymines in the cONA codes of SEQ ID NOs. 
24-3883 and 7744-19335. The homologous sequences 
may be obtained using any of the procedures described 
herein or may result from the correction of a sequencing 
error as described above. Preferably the homologous 
sequences and fragments of SEQ ID NOs. 24-3883 and 
7744-19335 include polynucleotides described in Ta- 
bles IVa and IVb, polynucleotides described in Tables 
IVa and IVb updated, or fragments comprising at least 
8, 10, 12, 15, 18, 20, 25, 28, 30, 35, 40, 50, 75, 100, 
150, 700, 300, 400, 500, 1000 or 2000 consecutive nu- 
cleotides of the polynucleotides described in Tables IVa 
and IVb, or polynucleotides described in Tables IVa and 
IVb updated. It will be appreciated that the cDNA codes 
of SEQ ID NOs. 24-3883 and 7744-19335 can be rep- 
resented in the traditional single character format (See 
the inside back cover of Styer, Lubert. Biochemistry, 3 rd 
edition. W. H Freeman & Co., New York.) or in any other 
format which records the identity of the nucleotides in a 
sequence. 

[0524] As used herein the term "polypeptide codes of 
SEQ ID NOS. 3884-7743" encompasses the polypep- 
tide sequence of SEQ ID NOs. 3884-7743 which are en- 
coded by the cDNAs of SEQ ID NOs. 24-3883, polypep- 
tide sequences homologous to the polypeptides of SEQ 
ID NOS. 3884-7743, or fragments of any of the preced- 
ing sequences. Homologous polypeptide sequences re- 
fer to a polypeptide sequence having at least 99%, 98%, 
97%, 96%, 95%, 90%, 85%, 80%, 75% homology to one 
of the polypeptide sequences of SEQ ID NOS. 
3884-7743. Homology may be determined using any of 
the computer programs and parameters described 
herein, including FASTA with the default parameters or 
with any modified parameters. The homologous se- 
quences may be obtained using any of the procedures 
described herein or may result from the correction of a 
sequencing error as described above. The polypeptide 



fragments comprise at least 5, 8, 10, 12, 15, 20, 25, 30, 
35, 40, 50, 60, 75, 100, 150 or 200 consecutive amino 
acids of the polypeptides of SEQ ID NOS. 3884-7743. 
Preferably, the fragments are novel fragments. Prefer- 

5 ably, the fragments include polypeptides encoded by the 
polynucleotides described in Tables IVa and IVb, poly- 
nucleotides described in Tables IVa and IVb updated, or 
fragments thereof comprising at least 5, 10, 15, 20, 25, 
30, 35, 40, 50, 75, 1 00, or 1 50 consecutive amino acids 

10 of the polypeptides encoded by the polynucleotides de- 
scribed in Tables IVa and IVb, or polynucleotides de- 
scribed in Tables IVa and IVb updated. It will be appre- 
ciated that the polypeptide codes of the SEQ ID NOS. 
3884-7743 can be represented in the traditional single 

is character format or three letter format (See the inside 
back cover of Starrier, Lubert. Biochemistry, 3 rd edition. 
W. H Freeman & Co., New York.) or in any other format 
which relates the identity of the polypeptides in a se- 
quence. 

20 [0525] It will be appreciated by those skilled in the art 
that the cDNA codes of SEQ ID NOs. 24-3883 and 
7744-19335 and polypeptide codes of SEQ ID NOS. 
3884-7743 can be stored, recorded, and manipulated 
on any medium which can be read and accessed by a 

25 computer. As used herein, the words "recorded" and 
"stored" refer to a process for storing information on a 
computer medium. A skilled artisan can readify adopt 
any of the presently known methods for recording infor- 
mation on a computer readable medium to generate 

30 manufactures comprising one or more of the cDNA 
codes of SEQ ID NOs. 24-3883 and 7744-19335, one 
or more of the polypeptide codes of SEQ ID NOS. 
3884-7743. Another aspect of the present invention is 
a computer readable medium having recorded thereon 

35 at least 2, 5, 10, 15, 20, 25, 30, or 50 cDNA codes of 
SEQ ID NOs. 24-3883 and 7744-19335. Another aspect 
of the present invention is a computer readable medium 
having recorded thereon at least 2, 5, 10, 15,20,25,30, 
or 50 polypeptide codes of SEQ ID NOS. 3884-7743. 

40 [0526] Computer readable media include magnetical- 
ly readable media, optically readable media, electroni- 
cally readable media and magnetic/optical media. For 
example, the computer readable media may be a hard 
disk, a floppy disk, a magnetic tape, CD-ROM, Digital 

45 Versatile Disk (DVD), Random Access Memory (RAM), 
or Read Only Memory (ROM) as well as other types of 
other media known to those skilled in the art. 
[0527] Embodiments of the present invention include 
systems, particularly computer systems which store and 

50 manipulate the sequence information described herein. 
One example of a computer system 100 is illustrated in 
block diagram form in Figure 6. As used herein, "a com- 
puter system" refers to the hardware components, soft- 
ware components, and data storage components used 

55 to analyze the nucleotide sequences of the cDN A codes 
of SEQ ID NOs. 24-3883 and 7744-19335, or the amino 
acid sequences of the polypeptide codes of SEQ ID 
NOS. 3884-7743. In one embodiment, the computer 
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system 100 is a Sum Enterprise 1000 server (Sum Mi- 
crosystems, Palo Alto, CA). The computer system 1 00 
preferably includes a processor for processing, access- 
ing and manipulating the sequence data. The processor 
105 can be any well-known type of central processing 5 
unit, such as the Pentium III from Intel Corporation, or 
similar processor from Sun, Motorola, Compaq or Inter- 
national Business Machines. 
[0528] Preferably, the computer system 1 00 is a gen- 
eral purpose system that comprises the processor 1 0 5 10 
and one or more internal data storage components 110 
for storing data, and one or more data retrieving devices 
for retrieving the data stored on the data storage com- 
ponents. A skilled artisan can readily appreciate that any 
one of the currently available computer systems are *5 
suitable. 

[0529] In one particular embodiment, the computer 
system 100 includes a processor 105 connected to a 
bus which is connected to a main memory 115 (prefer- 
ably implemented as RAM) and one or more internal da- 20 
ta storage devices 1 1 0, such as a hard drive and/or other 
computer readable media having data recorded there- 
on. In some embodiments, the computer system 100 
further includes one or more data retrieving device 118 
for reading the data stored on the internal data storage 25 
devices 110. 

[0530] The data retrieving device 1 1 8 may represent, 
for example, a floppy disk drive, a compact disk drive, 
a magnetic tape drive, etc. In some embodiments, the 
internal data storage device 1 1 0 is a removable compu- so 
ter readable medium such as a floppy disk, a compact 
disk, a magnetic tape, etc. containing control logic and/ 
or data recorded thereon. The computer system 100 
may advantageously include or be programmed by ap- 
propriate software for reading the control logic and/or ss 
the data from the data storage component once inserted 
in the data retrieving device. 

[0531] The computer system 100 includes a display 
120 which is used to display output to a computer user. 
It should also be noted that the computer system 1 00 to 
can be linked to other computer systems 125a-c in a 
network or wide area network to provide centralized ac- 
cess to the computer system 1 00. 
[0532] Software for accessing and processing the nu- 
cleotide sequences of the cDNA codes of SEQ ID NOs. 45 
24-3883 and 7744-1 9335, or the ammo acid sequences 
of the polypeptide codes of SEQ ID NOS. 3884-7743 
(such as search tools, compare tools, and modeling 
tools etc.) may reside in main memory 115 during exe- 
cution. 50 
[0533] In some embodiments, the computer system 
100 may further comprise a sequence comparer for 
comparing the above-described cDNA codes of SEQ ID 
NOs. 24-3883 and 7744-1 9335 or polypeptide codes of 
SEQ ID NOS. 3884-7743 stored on a computer reada- ss 
ble medium to reference nucleotide or polypeptide se- 
quences stored on a computer readable medium. A "se- 
quence comparer" refers to one or more programs 



which are implemented on the computer system 1 00 to 
compare a nucleotide or polypeptide sequence with oth- 
er nucleotide or polypeptide sequences and/or com- 
pounds including but not limited to peptides, peptidomi- 
metics, and chemicals stored within the data storage 
means. For example, the sequence comparer may com- 
pare the nucleotide sequences of the cDNA codes of 
SEQ ID NOs. 24-3 883 and 7744-19335, or the amino 
acid sequences of the polypeptide codes of SEQ ID 
NOS. 3884-7743 stored on a computer readable medi- 
um to reference sequences stored on a computer read- 
able medium to identify homologies, motifs implicated 
in biological function, or structural motifs. The various 
sequence comparer programs identified elsewhere in 
this patent specification are particularly contemplated 
for use in this aspect of the invention. 
[0534] Figure 7 is a flow diagram illustrating one em- 
bodiment of a process 200 for comparing a new nucle- 
otide or protein sequence with a database of sequences 
in order to determine the homology levels between the 
new sequence and the sequences in the database. The 
database of sequences can be a pnvate database 
stored within the computer system 100, or a public da- 
tabase such as GEN BANK, PIR or SWISSPROT that is 
available through the Internet. 
[0535] The process 200 begins at a start state 201 
and then moves to a state 202 wherein the new se- 
quence to be compared is stored to a memory in a com- 
puter system 100. As discussed above, the memory 
could be any type of memory, including RAM or an in- 
ternal storage device. 

[0536] The process 200 then moves to a state 204 
wherein a database of sequences is opened for analysis 
and comparison. The process 200 then moves to a state 
206 wherein the first sequence stored in the database 
is read into a memory on the computer. A comparison 
is then performed at a state 21 0 to determine if the first 
sequence is the same as the second sequence. It is im- 
portant to note that this step is not limited to performing 
an exact comparison between the new sequence and 
the first sequence in the database. Well-known methods 
are known to those of skill in the art for comparing two 
nucleotide or protein sequences, even if they are not 
identical. For example, gaps can be introduced into one 
sequence in order to raise the homology level between 
the two tested sequences. The parameters that control 
whether gaps or other features arc introduced into a se- 
quence during comparison are normally entered by the 
user of the computer system. 

[0537] Once a comparison of the two sequences has 
been performed at the state 210, a determination is 
made at a decision state 210 whether the two sequenc- 
es are the same. Of course, the term "same" is not lim- 
ited to sequences that are absolutely identical. Se- 
quences that are within the homology parameters en- 
tered by the user will be marked as "same" in the proc- 
ess 200. 

[0538] If a determination is made that the two se- 
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qu en ces are the same, the process 200 moves to a state 
214 wherein the name of the sequence from the data- 
base is displayed to the user. This state notifies the user 
that the sequence with the displayed name fulfills the 
homology constraints that were entered. Once the name s 
of the stored sequence is displayed to the user, the proc- 
ess 200 moves to a decision state 21 B wherein a deter- 
mination is made whether more sequences exist in the 
database. If no more sequences exist in the database, 
then the process 200 terminates at an end state 220. 
However, if more sequences do exist in the database, 
then the process 200 moves to a state 224 wherein a 
pointer is moved to the next sequence in the database 
so that it can be compared to the new sequence. In this 
manner, the new sequence is aligned and compared 
with every sequence in the database. 
[0539] It should be noted that if a determination had 
been made at the decision state 21 2 that the sequences 
were not homologous, then the process 200 would 
move immediately to the decision state 218 in order to 
determine if any other sequences were available in the 
database for comparison. 

[0540] Accordingly, one aspect of the present inven- 
tion is a computer system comprising a processor, a da- 
ta storage device having stored thereon a nucleic acid 
code of SEQ ID NOs. 24-3883 and 7744-19335 or a 
polypeptide code of SEQ ID NOS. 3884-7743, a data 
storage device having retrievably stored thereon refer- 
ence nucleotide sequences or polypeptide sequences 
to be compared to the nucleic acid code of SEQ ID NOs. 
24-3883 and 7744-19335 or polypeptide code of SEQ 
ID NOS. 3884-7743 and a sequence comparer for con- 
ducting the comparison. The sequence comparer may 
indicate a homology level between the sequences com- 
pared or identity structural motifs in the above described 
nucleic acid code of SEQ ID NOs. 24-3883 and 
7744-19335 and polypeptide codes of SEQ ID NOS. 
3884-7743 or it may identify structural motifs in se- 
quences which are compared to these cDNA codes and 
polypeptide codes. In some embodiments, the data stor- 
age device may have stored thereon the sequences of 
at least 2, 5, 1 0, 1 5, 20, 25, 30, or 50 of the cDNA codes 
of SEQ ID NOs. 24-3883 and 7744-19335 or polypep- 
tide codes of SEQ ID NOS. 3884-7743. 
[0541] Another aspect of the present invention is a 
method for determining the level of homology between 
a nucleic acid code of SEQ ID NOs. 24-3883 and 
7744-19335 and a reference nucleotide sequence, 
comprising the steps of reading the nucleic acid code 
and the reference nucleotide sequence through the use 
of a computer program which determines homology lev- 
els and determining homology between the nucleic acid 
code and the reference nucleotide sequence with the 
computer program. The computer program may be any 
of a number of computer programs for determining ho- 
mology levels, including those specifically enumerated 
herein, including BLAST2N with the default parameters 
or with any modified parameters. The method may be 



implemented using the computer systems described 
above. The method may also be performed by reading 
2, 5, 10, 15, 20, 25, 30, or 50 of the above described 
cDNA codes of SEQ ID NOs. 24-3883 and 7744^19335 
through use of the computer program and determining 
homology between the cDNA codes and reference nu- 
cleotide sequences . 

[0542] Figure 8 is a flow diagram illustrating one em- 
bodiment of a process 250 in a computer for determining 
whether two sequences are homologous. The process 
250 begins at a start state 252 and then moves to a state 
254 wherein a first sequence to be compared is stored 
to a memory. The second sequence to be compared is 
then stored to a memory at a state 256. The process 
250 then moves to a state 260 wherein the first character 
in the first sequence is read and then to a state 262 
wherein the first character of the second sequence is 
read, ft should be understood that if the sequence is a 
nucleotide sequence, then the character would normally 
be either A, T, C, G or U. If the sequence is a protein 
sequence, then It should be in the single letter amino 
acid code so that the first and sequence sequences can 
be easily compared. 

[0543] A determination is then made at a decision 
state 264 whether the two characters are the same. If 
they are the same, then the process 250 moves to a 
state 268 wherein the next characters in the first and 
second sequences arc read. A determination is then 
made whether the next characters are the same. If they 
are, then the process 250 continues this loop until two 
characters are not the same. If a determination is made 
that the next two characters are not the same, the proc- 
ess 250 moves to a decision state 274 to determine 
whether there are any more characters either sequence 
to read. 

[0544] If there aren't any more characters to read, 
then the process 250 moves to a state 276 wherein the 
level of homology between the first and second se- 
quences is displayed to the user. The level of homology 
is determined by calculating the profragment of charac- 
ters between the sequences that were the same out of 
the total number of sequences in the first sequence. 
Thus, if every character in a first 100 nucleotide se- 
quence aligned with a every character in a second se- 
quence, the homology level would be 100%. 
[0545] Alternatively, the computer program may be a 
computer program which compares the nucleotide se- 
quences of the cDNA codes of the present invention, to 
reference nucleotide sequences in order to determine 
whether the nucleic acid code of SEQ ID NOs. 24-3883 
and 7744-19335 differs from a reference nucleic acid 
sequence at one or more positions. Optionally such a 
program records the length and identity of inserted, de- 
leted or substituted nucleotides with respect to the se- 
quence of either the reference polynucleotide or the nu- 
cleic acid code of SEQ ID NOs. 24-3883 and 
7744-19335. In one embodiment, the computer pro- 
gram may be a program which determines whether the 
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nucleotide sequences of the cDNA codes of SEQ ID 
NOs. 24-3 883 and 7744-1 9335 contain a biallelic mark- 
er or single nucleotide polymorphism (SNP) with respect 
to a reference nucleotide sequence. This single nucle- 
otide polymorphism may comprise a single base substi- 
tution, insertion, or deletion, while this biallelic marker 
may comprise about one to ten consecutive bases sub- 
stituted, inserted or deleted. 

[0546] Another aspect of the present invention is a 
method for determining the level of homology between 
a polypeptide code of SEQ ID NOS. 3884-7743 and a 
reference polypeptide sequence, comprising the steps 
of reading the polypeptide code of SEQ ID NOS. 
3884-7743 and the reference polypeptide sequence 
through use of a computer program which determines 
homology levels and determining homology between 
the polypeptide code and the reference polypeptide se- 
quence using the computer program. 
Accordingly, another aspect of the present invention is 
a method for determining whether a nucleic acid code 
of SEQ ID NOs. 24-3883 and 7744-19335 differs at one 
or more nucleotides from a reference nucleotide se- 
quence comprising the steps of reading the nucleic acid 
code and the reference nucleotide sequence through 
use of a computer program which identifies differences 
between nucleic acid sequences and identifying differ- 
ences between the nucleic acid code and the reference 
nucleotide sequence with the computer program. In 
some embodiments, the computer program is a pro- 
gram which identifies single nucleotide polymorphisms. 
The method may be implemented by the computer sys- 
tems described above and the method illustrated in Fig- 
ure 8. The method may also be performed by reading 
at least 2, 5, 1 0, 1 5. 20, 25, 30, or 50 of the cDN A codes 
of SEQ ID NOs. 24-3883 and 7744-19335 and the ref- 
erence nucleotide sequences through the use of the 
computer program and identifying differences between 
the cDNA codes and the reference nucleotide sequenc- 
es with the computer program. 
[0547] In other embodiments the computer based 
system may further comprise an identifier for identifying 
features within the nucleotide sequences of the cDNA 
codes of SEQ ID NOs. 24-3883 and 7744-1 9335 or the 
amino acid sequences of the polypeptide codes of SEQ 
ID NOS. 3884-7743. 

[0548] An "identifier" refers to one or more programs 
which identifies certain features within the above-de- 
scribed nucleotide sequences ofthe cDNA codes of 
SEQ ID NOs. 24-3883 and 7744-19335 or the amino 
acid sequences of the polypeptide codes of SEQ ID 
NOS. 3884-7743. In one embodiment, the identifiermay 
comprise a program which identifies an open reading 
frame in the cDNAs codes of SEQ ID NOs. 24-3883 and 
7744-19335. 

[0549] Figure 9 is a flow diagram illustrating one em- 
bodiment of an identifier process 300 for detecting the 
presence of a feature in a sequence. The process 300 
begins at a start state 302 and then moves to a state 



304 wherein a first sequence that is to be checked for 
features is stored to a memory 1 1 5 in the computer sys- 
tem 100. The process 300 then moves to a state 306 
wherein a database of sequence features is opened. 

5 Such a database would include a list of each feature's 
attrfoutes along with the name of the feature. For exam- 
ple, a feature name could be "Initiation Codon" and the 
attrfcute would be "ATG". Another example would be the 
feature name TAATAA Box" and the feature attribute 

10 would be "TAATAA". An example of such a database is 
produced by the University of Wisconsin Genetics Com- 
puter Group (www.gcg.com). 

[0550] Once the database of features is opened at the 
state 306, the process 300 moves to a state 308 wherein 

' 5 the first feature is read from the database. A comparison 
of the attribute of the first feature with the first sequence 
is then made at a state 310. A determination is then 
made at a decision state 31 6 whether the attribute of the 
feature was found in the first sequence. If the attribute 

20 was found, then the process 300 moves to a state 31 8 
wherein the name of the found feature is displayed to 
the user. 

[0551] The process 300 then moves to a decision 
state 320 wherein a determination is made whether 

25 move features exist in the database. If no more features 
do exist, then the process 300 terminates at an end state 
324. However, if more features do exist in the database, 
then the process 300 reads the next sequence feature 
at a state 326 and loops back to the state 310 wherein 

30 the attribute of the next feature is compared against the 
first sequence. 

[0552] It should be noted, that if the feature attribute 
is not found in the first sequence at the decision state 
316, the process 300 moves directly to the decision 
35 state 320 in order to determine if any more features exist 
in the database. 

[0553] In another embodiment, the identifier may 
comprise a molecular modeling program which deter- 
mines the 3-dimensional structure of the polypeptides 

40 codes of SEQ ID NOS. 3884-7743. In some embodi- 
ments, the molecular modeling program identifies target 
sequences that are most compatible with profiles repre- 
senting the structural environments of the residues in 
known three-dimensional protein structures. (See, e.g., 

45 Eisenberg etal., U.S. Patent No. 5,436,850 issued Jury 
25, 1995). In another technique, the known three-di- 
mensional structures of proteins in a given family are 
superimposed to define the structurally conserved re- 
gions in that family. This protein modeling technique al- 

so so uses the known three-dimensional structure of a ho- 
mologous protein to approximate the structure of the 
polypeptide codes of SEQ ID NOS. 3884-7743. (See e. 
g., Srinivasan, etal., U.S. Patent No. 5,557,535 issued 
September 17, 1996). Conventional homology mode- 

55 ling techniques have been used routinely to build mod- 
els of proteases and antibodies. (Sowdhamini et a!., 
Protein Engineering 10:207, 215 (1997)). Comparative 
approaches can also be used to develop three-dimen- 
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sional protein models when the protein of interest has 
poor sequence identity to template proteins. In some 
cases, proteins fold into similar three-dimensional struc- 
tures despite having very weak sequence identities. For 
example, the three-dimensional structures of a number 
of helical cytokines fold in similar three-dimensional to- 
pology In spite of weak sequence homology. 
[0554] The recent development of threading methods 
now enables the identification of likely folding patterns 
in a number of situations where the structural related- 
ness between target and template(s) is not detectable 
at the sequence level. Hybrid methods, in which fold rec- 
ognition is performed using Multiple Sequence Thread- 
ing (MST), structural equivalencies are deduced from 
the threading output using a distance geometry program 
DRAGON to construct a low resolution model, and a full- 
atom representation is constructed using a molecular 
modeling package such as QUANTA. 
[0555] According to this 3-step approach, candidate 
templates are first identified by using the novel fold rec- 
ognition algorithm MST, which is capable of performing 
simultaneous threading of multiple aligned sequences 
onto one or more 3-D structures. In a second step, the 
structural equivalencies obtained from the MST output 
are converted into inter-residue distance restraints and 
fed into the distance geometry program DRAGON, to- 
gether with auxiliary information obtained from second- 
ary structure predictions. The program combines the re- 
straints in an unbiased manner and rapidly generates a 
large number of low resolution model confirmations. In 
a third step, these low resolution model confirmations 
are converted into full-atom models and subjected to en- 
ergy minimization using the molecular modeling pack- 
age QUANTA. (See e.g., Aszodi et al., Proteins: Struc- 
ture, Function, and Genetics, Supplement 1:38-42 
(1997)). 

[0556] The results of the molecular modeling analysis 
may then be used in rational drug design techniques to 
identify agents which modulate the activity of the 
polypeptide codes of SEQ ID NOS. 3884-7743. 
[0557] Accordingly, another aspect of the present in- 
vention is a method of identifying a feature within the 
cDNA codes of SEQ ID NOs. 24-3883 and 7744-1 9335 
or the polypeptide codes of SEQ ID NOS. 3884-7743 
comprising reading the nucleic acid code(s) or the 
polypeptide code(s) through the use of a computer pro- 
gram which identifies features therein and identifying 
features within the nucleic acid code(s) or polypeptide 
code(s) with the computer program. In one embodiment, 
computer program comprises a computer program 
which identifies open reading frames. In a further em- 
bodiment, the computer program identifies structural 
motifs in a polypeptide sequence, in another embodi- 
ment, the computer program comprises a molecular 
modeling program. The method may be performed by 
reading a single sequence or at least 2, 5, 10, 15, 20, 
25, 30, or 50 of the cDNA codes of SEQ ID NOs. 
24-3883 and 7744-19335 or the polypeptide codes of 



SEQ ID NOS. 3 884-7743 through the use of the com- 
puter program and identifying features within the cDNA 
codes or polypeptide codes with the computer program. 
The cDNA codes of SEQ ID NOs. 24-3 883 and 

5 7744-1 9335 or the polypeptide codes of SEQ ID NOS. 
3884-7743 may be stored and manipulated in a variety 
of data processor programs in a variety of formats. For 
example, the cDNA codes of SEQ ID NOs. 24-3883 and 
7744-19335 or the polypeptide codes of SEQ ID NOS. 

10 3884-7743 may be stored as text in a word processing 
file, such as MicrosoftWORD or WORDPERFECT or as 
an ASCII file in a variety of database programs familiar 
to those of skill in the art, such as DB2, SYBASE, or 
ORACLE. In addition, many computer programs and da- 
te tabases may be used as sequence comparers, identifi- 
ers, or sources of reference nucleotide or polypeptide 
sequences to be compared to the cDNA codes of SEQ 
ID NOs. 24-3 883 and 7744-19335 or the polypeptide 
codes of SEQ ID NOS. 3884-7743. The following list is 

20 intended not to limit the invention but to provide guid- 
ance to programs and databases which are useful with 
the cDNA codes of SEQ ID NOs. 24-3883 and 
7744-1 9335 or the polypeptide codes of SEQ ID NOS. 
3884-7743. The programs and databases which may be 

25 used include, but are not limited to: MacPattem (EM BL), 
DiscoveryBase (Molecular Applications Group), Gene- 
Mine (Molecular Applications Group), Look (Molecular 
Applications Group), MacLook (Molecular Applications 
Group), BLAST and BLAST2 (NCBI), BLASTN and 

30 BLASTX (Altschul et al, J. Mot. Biol 21 5: 403 (1 990)), 
FASTA (Pearson and Lipman, Proc. Natl. Acad. Sci. 
USA, 85: 2444 (1988)), FASTDB (Brutlag et al. Comp. 
App. Biosci . 6:237-245, 1 990), Catalyst (Molecular Sim- 
ulations Inc.), Cataryst/SHAPE (Molecular Simulations 

35 inc.), Cerius^DBAccess (Molecular Simulations Inc.), 
HypoGen (Molecular Simulations Inc.), Insight II, (Mo- 
lecular Simulations Inc.), Discover (Molecular Simula- 
tions Inc.), CHARMm (Molecular Simulations Inc.), Felix 
(Molecular Simulations Inc.), DelPhi, (Molecular Simu- 

40 lations Inc.), QuanteMM, (Molecular Simulations Inc.), 
Homology (Molecular Simulations Inc.), Modeler (Mo- 
lecular Simulations inc.), ISIS (Molecular Simulations 
Inc.), Quanta/Protein Design (Molecular Simulations 
Inc.), WebLab (Molecular Simulations Inc.), WebLab Di- 

45 versity Explorer (Molecular Simulations Inc.), Gene Ex- 
plorer (Molecular Simulations Inc.), SeqFold (Molecular 
Simulations Inc.), the EMBIVSwissprotein database, the 
MDL Available Chemicals Directory database, the MDL 
Drug Data Report data base, the Comprehensive Me- 

50 dicinal Chemistry database, Derwents's World Drug In- 
dex database, the BioByteMasterFile database, the 
Genbank database, and the Genseqn database. Many 
other programs and data bases would be apparent to 
one of skill in the art given the present disclosure. 

55 [0558] Motifs which may be detected using the above 
programs include sequences encoding leucine zippers, 
helix-turn-helix motifs, glycosylation sites, ubiquitination 
sites, alpha helices, and beta sheets, signal sequences 
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encoding signal peptides which direct the secretion of 
the encoded proteins, sequences implicated in tran- 
scription regulation such as homeoboxes, acidic 
stretches, enzymatic active sites, substrate binding 
sites, and enzymatic cleavage sites. 

EXAMPLE 60 

Methods of Making Nucleic Acids 

[0559] The present invention also comprises methods 
of making the EST-related nucleic acids, fragments of 
EST-related nucleic acids, positional segments of the 
EST-related nucleic acids, or fragments of positional 
segments of the EST-related nucleic acids. The meth- 
ods comprise sequentially linking together nucleotides 
to produce the nucleic acids having the preceding se- 
quences. A variety of methods of synthesizing nucleic 
acids are known to those skilled in the ait. 
[0560] In many of these methods, synthesis is con- 
ducted on a solid support. These included the 3' phos- 
phoramidite methods in which the 3* terminal base of 
the desired oligonucleotide is immobilized on an insol- 
uble earner. The nucleotide base to be added is blocked 
at the 5* hydroxy! and activated at the 3' hydroxy! so as 
to cause coupling with the immobilized nucleotide base. 
Deblocking of the new immobilized nucleotide com- 
pound and repetition of the cycle will produce the de- 
sired polynucleotide. Alternatively, polynucleotides may 
be prepared as described in U.S. Patent No. 5,049,656, 
the disclosure of which is incorporated herein by refer- 
ence. In some embodiments, several polynucleotides 
prepared as described above are ligated together to 
generate longer polynucleotides having a desired se- 
quence. 

EXAMPLE 61 

Methods of Making Polypeptides 

[0561 ] The present invention also comprises methods 
of making the polynucleotides encoded by EST-related 
nucleic acids, fragments of EST-related nucleic acids, 
positional segments of the EST-related nucleic acids, or 
fragments of positional segments of the EST-related nu- 
cleic acids and methods of making the EST-related 
polypeptides, fragments of EST-related polypeptides, 
positional segments of EST-related polypeptides, or 
fragments of EST-related polypeptides. The methods 
comprise sequentially linking together amino acids to 
produce the nucleic polypeptides having the preceding 
sequences. In some embodiments, the polypeptides 
made by these methods are 150 amino acid or less in 
length. In other embodiments, the polypeptides made 
by these methods are 1 20 amino acids or less in length. 
[0562] A variety of methods of making polypeptides 
are known to those skilled in the art, including methods 
in which the carboxyl terminal amino acid is bound to 



polyvinyl benzene or another suitable resin. The amino 
acid to be added possesses blocking groups on its ami- 
no moiety and any side chain reactive groups so that 
only Its carboxyl moiety can react The carboxyl group 
5 is activated with carbodiimide or another activating 
agent and allowed to couple to the immobilized amino 
acid. After removal of the blocking group, the cycle is 
repeated to generate a polypeptide having the desired 
sequence. Alternatively, the methods described in U.S. 

10 Patent No. 5,049,656, the disclosure of which is incor- 
porated herein by reference, may be used. 
[0563] As discussed above, the EST-related nucleic 
acids, fragments of the EST-related nucleic acids, posi- 
tional segments of the EST-related nucleic acids, or 

1$ fragments of positional segments of the EST-related nu- 
cleic acids can be used for various purposes. The poly- 
nucleotides can be used to express recombinant protein 
for analysts, characterization or therapeutic use; pro- 
duction of secreted polypeptides or chimeric porypep- 

20 tides, antibody production, as markers for tissues in 
which the corresponding protein is preferentially ex- 
pressed (either constitutrvely or at a particular stage of 
tissue differentiation or development or in disease 
states); as molecular weight markers on Southern gels; 

25 as chromosome markers or tags (when labeled) to iden- 
tify chromosomes or to map related gene positions; to 
compare with endogenous DNA sequences in patients 
to identify potential genetic disorders; as probes to hy- 
bridize and thus discover novel, related DNA sequenc- 

30 es; as a source of information to derive PCR primers for 
genetic f ingerprinting; for selecting and making oligom- 
ers for attachment to a "gene chip" or other support, in- 
cluding for examination for expression patterns; to raise 
anti-protein antibodies using DNA immunization tech- 

35 niques; and as an antigen to raise anti-DNA antibodies 
or elicit another immune response. Where the polynu- 
cleotide encodes a protein or polypeptide which binds 
or potentially binds to another protein or polypeptide 
(such as, for example, in a receptor-ligand interaction), 

40 the polynucleotide can also be used in interaction trap 
assays (such as, for example, that described in Gyuris 
era/., Cell 75:791 -803 (1993), the disclosure of which 
is hereby incorporated by reference) to identify polynu- 
cleotides encoding the other protein or polypeptide with 

45 which binding occurs or to identify inhibitors of the bind- 
ing interaction. 

[0564] The proteins or polypeptides provided by the 
present invention can similarly be used in assays to de- 
termine biological activity, including in a panel of multiple 

so proteins for high-throughput screening; to raise antibod- 
ies or to elicit another immune response; as a reagent 
(including the labeled reagent) in assays designed to 
quantitatively determine levels of the protein (or its re- 
ceptor) in biological fluids; as markers for tissues in 

55 which the corresponding protein is preferentially ex- 
pressed (either constrtutivefy or at a particular stage of 
tissue differentiation or development or in a disease 
state); and, of course, to isolate correlative receptors or 
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ligands. Where the protein or polypeptide binds or po- 
tentially binds to another protein or polypeptide (such 
as, for example, in a receptoNigand Interaction), the 
protein can be used to identify the other protein with 
which binding occurs or to identify inhibitors of the bind- 
ing interaction. Proteins or polypeptides involved in 
these binding interactions can also be used to screen 
for peptide or small molecule inhibitors or agonists of 
the binding interaction. 

[0565] Any or all of these research utilities are capable 
of being developed into reagent grade or kit format for 
commercialization as research products. 
[0566] Methods for performing the uses listed above 
are well known to those skilled in the art. References 
disclosing such methods include without limitation "Mo- 
lecular Cloning; A Laboratory Manual", 2d ed., Cold 
Spring Harbor Laboratory Press, Sambrook, J., E.F. 
Fritsch and T. Maniatis eds., 1 989, and "Methods in En- 
zymology; Guide to Molecular Cloning Techniques", Ac- 
ademic Press, Berger, S.L and A.R. Kimmel eds., 1 987. 
[0567] Polynucleotides and proteins or polypeptides 
of the present invention can also be used as nutritional 
sources or supplements. Such uses include without lim- 
itation use as a protein or amino acid supplement, use 
as a carbon source, use as a nitrogen source and use 
as a source of carbohydrate. In such cases the protein 
or polynucleotide of the invention can be added to the 
feed of a particular organism or can be administered as 
a separate solid or liquid preparation, such as in the form 
of powder, pills, solutions, suspensions or capsules. In 
the case of microorganisms, the protein or polynucle- 
otide of the invention can be added to the medium in or 
on which the microorganism is cultured. 
[0568] Although this invention has been described in 
terms of certain preferred embodiments, other embodi- 
ments which will be apparent to those of ordinary skill 
in the art in view of the disclosure herein are also within 
the scope of this invention. Accordingly, the scope of the 
invention is intended to be defined only by reference to 
the appended claims. All documents cited herein are in- 
corporated herein by reference in their entirety. 



Claims 

1 . A purified nucleic acid comprising a sequence se- 
lected from the group consisting of SEQ ID NOs. 
24-3883 and SEQ ID NOs. 7744-19335 and se- 
quences complementary to the sequences of SEQ 
ID NOs. 24-3883 and SEQ ID NOs. 7744-19335. 

2. A purified nucleic acid comprising at least 1 0 con- 
secutive nucleotides of a sequence selected from 
the group consisting of SEQ ID NOs. 24-3883 and 
SEQ ID NOs. 7744-19335 and sequences comple- 
mentary to the sequences of SEQ ID NOs. 24-3 883 
and SEQ ID NOs. 7744-19335. 



3. A purified nucleic acid comprising at least 15 con- 
secutive nucleotides of a sequence selected from 
the group consisting of SEQ ID NOs. 24-3883 and 
SEQ ID NOs. 7744-19335 and sequences comple- 

5 mentary to the sequences of SEQ ID NOs. 24-3883 
and SEQ ID NOs. 7744-19335. 

4. A purified nucleic acid comprising the coding se- 
quence of a sequence selected from the group con- 

10 sisting of SEQ ID NOs. 24-3883. 

5. A punf ied nucleic acid comprising the full coding se- 
quences of a sequence selected from the group 
consisting of SEQ ID NOs. 1339-2059 wherein the 

15 full coding sequence comprises the sequence en- 
coding the signal peptide and the sequence encod- 
ing the mature protein. 

6. A punfied nucleic add comprising a contiguous 
20 span of a sequence selected from the group con- 
sisting of SEQ ID NOs. 1339-2059 which encodes 
the mature protein. 

7. A purified nucleic acid comprising a contiguous 
25 span of a sequence selected from the group con- 
sisting of SEQ ID NOs. 24-383 and 1339-2059 
which encode the signal peptide. 

8. A purified nucleic acid encoding a polypeptide com- 
30 prising a sequence selected from the group consist- 
ing of the sequences of SEQ ID NOs. 3884-7743. 

9. A purified nucleic acid encoding a polypeptide com- 
prising a sequence selected from the group consist- 
as ing of the sequences of SEQ ID NOs. 51 99-591 9. 

10. A purified nucleic add encoding a polypeptide com- 
prising a mature protein included in a sequence se- 
lected from the group consisting of the sequences 

40 of SEQ ID NOs. 51 99-591 9. 

1 1 . A purified nucleic acid encoding a polypeptide com- 
prising a signal peptide included in a sequence se- 
lected from the group consisting of the sequences 

45 of SEQ ID NOs. 3884-4243 and 51 99-591 9. 

12. A purified nucleic acid which hybridizes under strin- 
gent conditions to a sequence comprising at least 
1 5 consecutive nucleotides of a sequence selected 

so from the group consisting of SEQ ID NOs. 24-3883 
and SEQ ID NOs. 7744-1 9335 and sequences com- 
plementary to the sequences of SEQ ID NOs. 
24-3883 and SEQ ID NOs. 7744-19335. 

55 13. A purified or isolated polypeptide comprising a se- 
quence selected from the group consisting of the 
sequences of SEQ ID NOs. 3884-7743. 
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14. A purified or isolated polypeptide comprising a se- 
quence selected from the group consisting of SEQ 
ID NOs. 5199-5919. 

1 5. A purified or isolated polypeptide comprising a ma- 5 
ture protein of a polypeptide selected from the 
group consisting of SEQ ID NOs. 5199-5919. 

16. A purified or isolated polypeptide comprising a sig- 
nal peptide of a sequence selected from the group io 
consisting of the polypeptides of SEQ ID NOs. 3 
884-4243 and 5199-5919. 

17. A purified or isolated polypeptide comprising at 
least 1 0 consecutive amino acids of a sequence se- is 
iected from the group consisting of the sequences 

of SEQ ID NOs. 3884-7743. 

18. A method of making a cDNA comprising the steps 

of: 20 

a) contacting a collection of mRNA molecules 
from human cells with a primer comprising at 
least 1 5 consecutive nucleotides of a sequence 
selected from the group consisting of the se- 25 
quences complementary to SEQ ID NOs. 24-3 
883 and SEQ ID NOs. 7744-19335; 

b) hybridizing said primer to an mRNA in said 
collection that encodes said protein; 

c) reverse transcribing said hybridized primer 30 
to make a first cDNA strand from said mRNA; 

d) making a second cDNA strand complemen- 
tary to said first cDNA strand; and 

e) isolating the resulting cDNA encoding said 
protein comprising said first cDNA strand and 35 
said second cDN A strand. 

19. A purified cDNA obtainable by the method of Claim 
18. 

40 

20. The cDNA of Claim 1 9 wherein said cDNA encodes 
at least a portion of a human polypeptide. 

21. A method of making a cDNA comprising the steps 

of: 45 

a) contacting a cDNA collection with a detecta- 
ble probe comprising at least 15 consecutive 
nucleotides of a sequence selected from the 
group consisting of SEQ ID NOs. 24-3883 and so 
SEQ ID NOs. 7744-19335 and the sequences 
complementary to SEQ ID NOs. 24-3883 and 
SEQ ID NOs. 7744-19335 under conditions 
which permit said probe to hybridize to said cD- 
NA; 55 

b) identifying a cDNA which hybridizes to said 
detectable probe; and 

c) isolating said cDNA which hybridizes to said 



probe. 

22. A purified cDNA obtainable by the method of Claim 

21 • .... , • ■<« •• 

23. The cDN A of Claim 22 wherein said cDN A encodes 
at least a portion of a human polypeptide. 

24. A method of making a cDNA comprising the steps 
of: 

a) contacting a collection of mRNA molecules 
from human cells with a first primer capable of 
hybridizing to the polyA tail of said mRNA; 

b) hybridizing said first primer to said polyA tail; 

c) reverse transcribing said mRNA to make a 
first cDNA strand; 

d) making a second cDNA strand complemen- 
tary to said first cDN A strand using at least one 
primer comprising at least 15 consecutive nu- 
cleotides of a sequence selected from the 
group consisting of SEQ ID NOs. 24-3883 and 
SEQ ID NOs. 7744-19335; and 

25. A purified cDNA obtainable by the method of Claim 

24. 

26. The cDN A of Claim 25 wherein said cDN A encodes 
at least a portion of a human polypeptide. 

27. The method of Claim 24, wherein the second cDNA 
strand is made by: 

a) contacting said first cDNA strand with a sec- 
ond primer comprising at least 15 consecutive 
nucleotides of a sequence selected from the 
group consisting of SEQ ID NOs. 24-3 883 and 
SEQ ID NOs. 7744-19335 and a third primer 
which sequence is fulfy included within the se- 
quence of said first primer; 

b) performing a first polymerase chain reaction 
with said second and third primers to generate 
a second cDNA strand; 

28. A purified cDNA obtainable by the method of Claim 

27. 

29. The cDNA of Claim 28 wherein said cDN A encodes 
at least a portion of a human polypeptide. 

30. The method of Claim 24, wherein the second cDNA 
strand is made by: 

a) contacting said first cDNA strand with a sec- 
ond primer comprising at least 15 consecutive 
nucleotides of a sequence selected from the 
group consisting of SEQ ID NOs. 24-3883 and 
SEQ ID NOs. 7744-19335 and a third primer 
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which sequence is fully Included within the se- 
quence of said first primer; 

b) performing a first polymerase chain reaction 
with said second and third primers to generate 
a first PGR product; 

c) contacting said first PCR product with a 
fourth primer comprising at least 15 consecu- 
tive nucleotides of said sequence selected from 
the group consisting of SEQ ID NOs. 24*3883 
and SEQ ID NOs. 7744-1 9335, and a fifth prim- 
er which sequence is fully included within the 
sequence of said third primer, wherein said 
fourth and fifth primers hybridize to sequences 
within the first said PCR product, and; 

d) performing a second polymerase chain re- 
action with the fourth and fifth primers, thereby 
generating a second PCR product. 

31 . A purified cDNA obtainable by the method of Claim 
30. 

32. The cDNA of Claim 31 wherein said cDNA encodes 
at least a portion of a human polypeptide. 

33. The method of Claim 24 wherein the second cDNA 
strand is made by: 

a) contacting said first cDNA strand with a sec- 
ond pnmer comprising at least 15 consecutive 
nucleotides of a sequence selected from the 
group consisting of SEQ ID NOs. 24-3883 and 
SEQ ID NOs. 7744-19335; 

b) hybridizing said second primer to said first 
strand cDNA; and 

c) extending said hybridized second primer to 
generate said second cDNA strand. 

34. A purified cDNA obtainable by the method of Claim 
33. 

35. The cDNA of Claim 34, wherein said cDNA encodes 
at least a portion of a human polypeptide. 

36. A method of making a polypeptide comprising the 
steps of: 

a) obtaining a cDNA which encodes a polypep- 
tide encoded by a nucleic acid comprising a se- 
quence selected from the group consisting of 
SEQ ID NOs. 24-3 883 or a cDNA which en- 
codes a polypeptide comprising at least 1 0 con- 
secutive amino acids of a polypeptide encoded 
by a sequence selected from the group consist- 
ing of SEQ ID NOs. 24-3883; 

b) inserting said cDNA in an expression vector 
such that said cDNA is operably linked to a pro- 
moter; 

c) introducing said expression vector into a host 



cell whereby said host cell produces the protein 
encoded by said cDNA: and 
d) isolating said protein. 

5 37. An isolated protein obtainable by the method of 
Claim 36. 

38. A method of obtaining a promoter DNA comprising 
the steps of: 

10 

a) obtaining genomic DNA located upstream of 
a nucleic acid comprising a sequence selected 
from the group consisting of SEQ ID NOs. 24-3 
883 and SEQ ID NOs. 7744-1 9335 and the se- 
tt quences complementary to the sequences of 

SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335; 

b) screening the upstream genomic DNA to 
identify a promoter capable of directing tran- 

20 scription initiation; and 

c) isolating the upstream genomic DNA com- 
prising the promoter. 

39. The method of Claim 38, wherein said obtaining 
25 step comprises walking from genomic DNA com- 
prising a sequence selected from the group consist- 
ing of SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-1 9335 and the sequences complementary to 
SEQ ID NOs. 24-3883 and SEQ ID NOs. 

30 7744-19335. 

40. The method of Claim 39, wherein said screening 
step comprises inserting genomic DNA located up- 
stream of a sequence selected from the group con- 

35 sisting of SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335 and the sequences complementary to 
SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-1 9335 into a promoter reporter vector. 

40 41. The method of Claim 39, wherein said screening 
step comprises identifying motifs in genomic DNA 
located upstream of a sequence selected from the 
group consisting of SEQ ID NOs. 24-3883 and SEQ 
ID NOs. 7744-19335 and the sequences compte- 
rs mentary to SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335 which are transcription factor binding 
sites or transcription start sites. 

42. An isolated promoter obtainable by the method of 
50 Claim 38. 

43. In an array of discrete ESTs or fragments thereof of 
at least 15 nucleotides in length, the improvement 
comprising inclusion in said array of at least one se- 

55 quence selected from the group consisting of SEQ 
ID NOs. 24-38B3 and SEQ ID NOs. 7744-19335, 
the sequences complementary to the sequences of 
SEQ ID NOs. 24-3883 and SEQ ID NOs. 
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7744-19335 and fragments comprising at least 15 
consecutive nucleotides of said sequence. 

44. The array of Claim 43 Including therein at least two 
sequences selected from the group consisting of 
SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-19335, the sequences complementary to the 
sequences of SEQ ID NOs. 24-3883 and SEQ ID 
NOs. 7744-19335, and fragments comprising at 
least 15 consecutive nucleotides of said sequenc- 
es. 

45. The array of Claim 43 including therein at least five 
sequences selected from the group consisting of 
SEQ ID NOs. 24-3883 and SEQ ID NOs. 
7744-1 9335, the sequences complementary to the 
sequences of SEQ ID NOs. 24-3883 and SEQ ID 
NOs. 7744-19335 and fragments comprising at 
least 15 consecutive nucleotides of said sequenc- 



46. An enriched population of recombinant nucleic ac- 
ids, said recombinant nucleic acids comprising an 
insert nucleic acid and a backbone nucleic acid, 
wherein at least 5% of said insert nucleic acids in 
said population comprise a sequence selected from 
the group consisting of SEQ ID NOs. 24-3883 and 
SEQ ID NOs. 7744-1 9335 and the sequences com- 
plementary to SEQ ID NOs. 24-3883 and SEQ ID 
NOs. 7744-19335. 

47. A purified or isolated antibody capable of specifical- 
ly binding to a polypeptide comprising a sequence 
selected from the group consisting of SEQ ID NOs. 
3884-7743. 

48. A purified or isolated antibody capable of specifical- 
ly binding to a polypeptide comprising at least 10 
consecutive amino acids of a sequence selected 
from the group consisting of SEQ ID NOs. 
3884-7743. 

49. An antibody composition capable of selectively 
binding to an epitope-containing fragment of a 
polypeptide comprising a contiguous span of at 
least 8 amino acids of any of SEQ ID NOs. 
3884-7743, wherein said antibody is polyclonal or 
monoclonal. 

50. A computer readable medium having stored there- 
on a sequence selected from the group consisting 
of a nucleic acid code of SEQ ID NOs. 24-3 883 and 
7744-19335 and a polypeptide code of SEQ ID 
NOs. 3884-7743. 

51 . A computer system comprising a processor and a 
data storage device wherein said data storage de- 
vice has stored thereon a sequence selected from 



the group consisting of a nucleic add code of SE- 
QID NOs. 24-3883 and 7744-19335 and a polypep- 
tide code of SEQ ID NOs. 3884-7743. 

5 52. The computer system of Claim 51 further compris- 
ing a sequence comparer and a data storage device 
having reference sequences stored thereon. 

53. The computer system of Claim 52 wherein said se- 
tt quence comparer comprises a computer program 

which indicates polymorphisms. 

54. The computer system of Claim 51 further compris- 
ing an identifier which identifies features in said se- 

is quence. 

55. A method for comparing a first sequence to a refer- 
ence sequence wherein said first sequence is se- 
lected from the group consisting of a nucleic acid 

20 code of SEQID NOs. 24-3 883 and 7744-1 9335 and 
a polypeptide code of SEQ ID NOs. 3 884-7743 
compnsing the steps of: 

a) reading said first sequence and said refer- 
25 ence sequence through use of a computer pro- 
gram which compares sequences; and 

b) determining differences between said first 
sequence and said reference sequence with 
said computer program. 

30 

56. The method of Claim 55, wherein said step of de- 
termining differences between the first sequence 
and the reference sequence comprises identifying 
polymorphisms. 

35 

57. A method for identifying a feature in a sequence se- 
lected from the group consisting of a nucleic acid 
code of SEQID NOs. 24-3883 and 7744-1 9335 and 
a polypeptide code of SEQ ID NOs. 3884-7743 

40 comprising the steps of: 

a) reading said sequence through the use of a 
computer program which identifies features in 
sequences; and 
45 b) identifying features in said sequence with 

said computer program. 

58. A vector comprising a nucleic acid according to any 
one of Claims 1-12. 

50 

59. A host cell containing a nucleic acid of Claim 58. 

60. A method of making a nucleic acid of any one of 
Claims 1-12 comprising the steps of: 

55 

a) introducing said nucleic acid into a host cell 
such that said nucleic acid is present in multiple 
copies in each host cell; and 
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b) isolating said nucleic acid from said host cell. 

61. A method of making a nucleic acid of any one of 
Claims 1-12 comprising the step of sequentially 
linking together the nucleotides In said nucleic ac- 
ids. 

62. A method of making a polypeptide of any one of 
Claims 13-1 7 wherein said polypeptides is 1 50 ami- 
no acids in length or less comprising the step of se- 
quentially linking together the amino acids in said 
polypeptides. 

63. A method of making a polypeptide of any one of 
Claims 13-17 wherein said polypeptides is 1 20 ami- 
no acids in length or less comprising the step of se- 
quentially linking together the amino acids in said 
polypeptides. 
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Description of Transcription Factor Binding Sites Present on Promoters 



Isolated From SignalTag Sequences 



Promoter sequence P13H2 (548 bp): 
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Promoter sequence P29B6 (555 bp) : 
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The Search Division considers thai the present European patent application does not conriy wbh the 
requirements of unity of Invention and relates to several inventions or groups of InventlonsTnamely: 

1. Claims: Invention 1. Claims: (1-4,8,11-13,16-63) partially 

A purified nucleic acid conprising a sequence consisting of 
SEQ 10 N0.24 and a sequence complementary to said sequence; 
a purified nucleic acid encoding a polypeptide comprising a 
signal peptide included in a sequence selected from SEQ 10 
No. 3884; a purified polypeptide of SEQ ID Mo. 3884; a 
method of making said nucleic acid and/or polypeptide; a 
method of obtaining a promoter DNA upstream of said nucleic 
acid by screening the upstream genomic ONA; an isolated 
promoter obtainable by said method; array of ESTs conprising 
sequence ID no.24 or fragments thereof of at least 15 
nucleotides in length; a purified or isolated antibody 
capable of binding to said polypeptide; a computer readable 
medium and a conputer system having stored thereon and/or 
utilising a sequence selected from SEQ ID Nos. 24 and 3884; a 
vector comprising said nucleic acid; a host cell conprising 
said vector; a method of making said poyl peptide; 



2. Claims: Claims: (1-63) partially, as far as applicable 

Idem as subject 1, but limited to SEQ ID Hos.25 and 3885: 
(Invention 2 is limited to SEQ ID Nos. 25 and 3885* 

Invention 3 is limited to SEQ ID Nos. 26 and 3886;! 

Invention 3860 is limited to SEQ ID Nos. 3883 and 7743; 
Invention 3861 is limited to SEQ ID No. 7744; 
Invention 15452 is limited to SEQ ID No. 19335); ' 



For the sake of conciseness, the first invention is 
explicitly defined, the other subject matters are defined 
analogy hereto. 



Due to the great number of DNA and amino acid sequences of 
the present application and due to the lack of indications 
in the application, it was not possible in each case to 
verify which amino acid sequence correspond to which DNA 
sequence. It was however assumed that the sequential order 
of the DNA and amino acid sequences correspond so that DMA 
sequence 24 expresed amino acid 3884, DNA sequence 25 

expressed 3885, DNA sequence 3883 expressed 7743. For 

DNA sequences 7744 - 19335 it was assumed that no amino acid 
sequences were claimed. 
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